summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [GlobalISel] Enable legalizing non-power-of-2 sized types.Kristof Beyls2017-11-076-209/+554
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the interface of how targets describe how to legalize, see the below description. 1. Interface for targets to describe how to legalize. In GlobalISel, the API in the LegalizerInfo class is the main interface for targets to specify which types are legal for which operations, and what to do to turn illegal type/operation combinations into legal ones. For each operation the type sizes that can be legalized without having to change the size of the type are specified with a call to setAction. This isn't different to how GlobalISel worked before. For example, for a target that supports 32 and 64 bit adds natively: for (auto Ty : {s32, s64}) setAction({G_ADD, 0, s32}, Legal); or for a target that needs a library call for a 32 bit division: setAction({G_SDIV, s32}, Libcall); The main conceptual change to the LegalizerInfo API, is in specifying how to legalize the type sizes for which a change of size is needed. For example, in the above example, how to specify how all types from i1 to i8388607 (apart from s32 and s64 which are legal) need to be legalized and expressed in terms of operations on the available legal sizes (again, i32 and i64 in this case). Before, the implementation only allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0, s128}, NarrowScalar). A worse limitation was that if you'd wanted to specify how to legalize all the sized types as allowed by the LLVM-IR LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times and probably would need a lot of memory to store all of these specifications. Instead, the legalization actions that need to change the size of the type are specified now using a "SizeChangeStrategy". For example: setLegalizeScalarToDifferentSizeStrategy( G_ADD, 0, widenToLargerAndNarrowToLargest); This example indicates that for type sizes for which there is a larger size that can be legalized towards, do it by Widening the size. For example, G_ADD on s17 will be legalized by first doing WidenScalar to make it s32, after which it's legal. The "NarrowToLargest" indicates what to do if there is no larger size that can be legalized towards. E.g. G_ADD on s92 will be legalized by doing NarrowScalar to s64. Another example, taken from the ARM backend is: for (unsigned Op : {G_SDIV, G_UDIV}) { setLegalizeScalarToDifferentSizeStrategy(Op, 0, widenToLargerTypesUnsupportedOtherwise); if (ST.hasDivideInARMMode()) setAction({Op, s32}, Legal); else setAction({Op, s32}, Libcall); } For this example, G_SDIV on s8, on a target without a divide instruction, would be legalized by first doing action (WidenScalar, s32), followed by (Libcall, s32). The same principle is also followed for when the number of vector lanes on vector data types need to be changed, e.g.: setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal); setLegalizeVectorElementToDifferentSizeStrategy( G_ADD, 0, widenToLargerTypesUnsupportedOtherwise); As currently implemented here, vector types are legalized by first making the vector element size legal, followed by then making the number of lanes legal. The strategy to follow in the first step is set by a call to setLegalizeVectorElementToDifferentSizeStrategy, see example above. The strategy followed in the second step "moreToWiderTypesAndLessToWidest" (see code for its definition), indicating that vectors are widened to more elements so they map to natively supported vector widths, or when there isn't a legal wider vector, split the vector to map it to the widest vector supported. Therefore, for the above specification, some example legalizations are: * getAction({G_ADD, LLT::vector(3, 3)}) returns {WidenScalar, LLT::vector(3, 8)} * getAction({G_ADD, LLT::vector(3, 8)}) then returns {MoreElements, LLT::vector(8, 8)} * getAction({G_ADD, LLT::vector(20, 8)}) returns {FewerElements, LLT::vector(16, 8)} 2. Key implementation aspects. How to legalize a specific (operation, type index, size) tuple is represented by mapping intervals of integers representing a range of size types to an action to take, e.g.: setScalarAction({G_ADD, LLT:scalar(1)}, {{1, WidenScalar}, // bit sizes [ 1, 31[ {32, Legal}, // bit sizes [32, 33[ {33, WidenScalar}, // bit sizes [33, 64[ {64, Legal}, // bit sizes [64, 65[ {65, NarrowScalar} // bit sizes [65, +inf[ }); Please note that most of the code to do the actual lowering of non-power-of-2 sized types is currently missing, this is just trying to make it possible for targets to specify what is legal, and how non-legal types should be legalized. Probably quite a bit of further work is needed in the actual legalizing and the other passes in GlobalISel to support non-power-of-2 sized types. I hope the documentation in LegalizerInfo.h and the examples provided in the various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well enough how this is meant to be used. This drops the need for LLT::{half,double}...Size(). Differential Revision: https://reviews.llvm.org/D30529 llvm-svn: 317560
* [CGP] Disable Select instruction handling in optimizeMemoryInst. NFCSerguei Katkov2017-11-071-1/+1
| | | | | | | | | | | | | | | | | | This patch disables the handling of selects in optimization extensing scope of optimizeMemoryInst. The optimization itself is disable by default. The idea here is just to switch optimiztion level step by step. Specifically, first optimization will be enabled only for Phi nodes, then select instructions will be added. In case someone will complain about perfromance it will be easier to detect what part of optimizations is responsible for that. Differential Revision: https://reviews.llvm.org/D36073 llvm-svn: 317555
* [X86] Don't clobber reserved registers with stack adjustmentsBjorn Steinbrink2017-11-071-0/+5
| | | | | | | | | | | | | | | | | | | | | Summary: Calls using invoke in funclet based functions are assumed to clobber all registers, which causes the stack adjustment using pops to consider all registers not defined by the call to be undefined, which can unfortunately include the base pointer, if one is needed. To prevent this (and possibly other hazards), skip reserved registers when looking for candidate registers. This fixes issue #45034 in the Rust compiler. Reviewers: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39636 llvm-svn: 317551
* [X86] Add patterns to fold a 64-bit load into the EVEX vcvtph2ps instructions.Craig Topper2017-11-071-7/+16
| | | | llvm-svn: 317548
* [X86] Add patterns for folding a v16i8 with the VEX vcvtph2ps intrinsics.Craig Topper2017-11-071-2/+4
| | | | | | Disable the peephole pass to prove that the pattern is working. llvm-svn: 317547
* [X86] Add support for using EVEX instructions for the legacy vcvtph2ps ↵Craig Topper2017-11-076-28/+40
| | | | | | | | intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544
* [X86] Use IMPLICIT_DEF in VEX/EVEX vcvtss2sd/vcvtsd2ss patterns instead of a ↵Craig Topper2017-11-072-4/+4
| | | | | | | | COPY_TO_REGCLASS. ExeDepsFix pass should take care of making the registers match. llvm-svn: 317542
* [X86] Remove 'Requires' from instructions with no patterns. NFCCraig Topper2017-11-071-6/+3
| | | | llvm-svn: 317541
* [Support/UNIX] posix_fallocate() can fail with EINVAL.Davide Italiano2017-11-071-1/+1
| | | | | | | | | | | | | | | | | According to the docs on opegroup.org, the function can return EINVAL if: The len argument is less than zero, or the offset argument is less than zero, or the underlying file system does not support this operation. I'd say it's a peculiar choice (when EONOTSUPP is right there), but let's keep POSIX happy for now. This was independently discovered by Mark Millard (on FreeBSD/ZFS). Quickly ack'ed by Rui on IRC. llvm-svn: 317535
* Make DIExpression::createFragmentExpression() return an Optional.Adrian Prantl2017-11-075-18/+37
| | | | | | | We can't safely split arithmetic into multiple fragments because we can't express carry-over between fragments. llvm-svn: 317534
* [IPO/LowerTypesTest] Skip blockaddress(es) when replacing uses.Davide Italiano2017-11-072-1/+23
| | | | | | | | | | | | | Blockaddresses refer to the function itself, therefore replacing them would cause an assertion in doRAUW. Fixes https://bugs.llvm.org/show_bug.cgi?id=35201 This was found when trying CFI on a proprietary kernel by Dmitry Mikulin. Differential Revision: https://reviews.llvm.org/D39695 llvm-svn: 317527
* AMDGPU: Remove redundant combineMatt Arsenault2017-11-072-39/+0
| | | | | | | | | | | | | | | | | | | | This combine was already done in two places. The generic combiner already has done this since r217610, for adds (with a single use). This one was added in r303641, and added support for handling or as well. r313251 later added support to the generic combine for or. It also turns out the isOrEquivalentToAdd check is not necessary for this combine. Additionally, we already reproduce this combine in yet another place in the backend, although in that version multiple uses of the add are still folded if it will allow a fold into the addressing mode. That version needs to be improved to understand ors though, as well as the correct legal offsets for private. llvm-svn: 317526
* [DebugInfo] Unify logic to merge DILocations. NFC.Vedant Kumar2017-11-062-19/+28
| | | | | | | | | | | | This makes DILocation::getMergedLocation() do what its comment says it does when merging locations for an Instruction: set the common inlineAt scope. This simplifies Instruction::applyMergedLocation() a bit. Testing: check-llvm, check-clang Differential Revision: https://reviews.llvm.org/D39628 llvm-svn: 317524
* [Support][Chrono] Use explicit cast of text output of time values.Simon Dardis2017-11-061-3/+3
| | | | | | | | | | | | | | | | | | rL316419 exposed a platform specific issue where the type of the values passed to llvm::format could be different to the format string. Debian unstable for mips uses long long int for std::chrono:duration, while x86_64 uses long int. For mips, this resulted in the value being corrupted when rendered to a string. Address this by explicitly casting the result of the duration_cast to the type specified in the format string. Reviewers: sammccall Differential Revision: https://reviews.llvm.org/D39597 llvm-svn: 317523
* InstCombine: salvage the debug info of DCE'ed add instructions.Adrian Prantl2017-11-061-12/+23
| | | | | | rdar://problem/31209283 llvm-svn: 317522
* [X86] Make FeatureAVX512 imply FeatureF16C.Craig Topper2017-11-064-36/+5
| | | | | | | | | | The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available. Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns. All known CPUs with AVX512 have F16C so this should safe for now. llvm-svn: 317521
* [X86] Make FeatureAVX512 imply FeatureFMA.Craig Topper2017-11-062-5/+5
| | | | | | | | | | | | Previously our VEX patterns were checking Subtarget.hasFMA() which checked FMA || AVX512. So we were behaving as if AVX512 implied it anyway. Which means we'd allow VEX encoded 128/256 FMA when AVX512F was enabled but AVX512VL is off. Regardless of the FMA flag. EVEX to VEX also transforms scalar EVEX FMA instructions to their VEX versions even without the FMA flag. Similarly for 128/256 under AVX512VL. So this makes AVX512 imply FeatureFMA to make our current behavior explicit. All known CPUs that support AVX512 have VEX FMA instructions. llvm-svn: 317520
* [ValueTracking] readonly (const) is a requirement for converting sqrt to ↵Sanjay Patel2017-11-061-3/+1
| | | | | | | | | | | | | | | | | | | | | | llvm.sqrt; nnan is not As discussed in D39204, this is effectively a revert of rL265521 which required nnan to vectorize sqrt libcalls based on the old LangRef definition of llvm.sqrt. Now that the definition has been updated so the libcall and intrinsic have the same semantics apart from potentially setting errno, we can remove the nnan requirement. We have the right check to know that errno is not set: if (!ICS.onlyReadsMemory()) ...ahead of the switch. This will solve https://bugs.llvm.org/show_bug.cgi?id=27435 assuming that's being built for a target with -fno-math-errno. Differential Revision: https://reviews.llvm.org/D39642 llvm-svn: 317519
* Revert r317510 "[InstCombine] Pull shifts through a select plus binop with ↵Hans Wennborg2017-11-061-82/+27
| | | | | | | | | | | | | | | | constant" This broke the CodeGen/Hexagon/loop-idiom/pmpy-mod.ll test on a bunch of buildbots. > This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select. > > This can allow us to get the select closer to other selects to enable removing one. > > Differential Revision: https://reviews.llvm.org/D39222 > > git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317510 91177308-0d34-0410-b5e6-96231b3b80d8 llvm-svn: 317518
* Fix comment /NFCXinliang David Li2017-11-061-3/+4
| | | | llvm-svn: 317514
* [MIRPrinter] Use %subreg.xxx syntax for subregister index operandsBjorn Pettersson2017-11-061-8/+13
| | | | | | | | | | | | | | | | | | Summary: Print %subreg.<subregidxname> instead of just the subregister index when printing immediate operands corresponding to subreg indices in INSERT_SUBREG, EXTRACT_SUBREG, SUBREG_TO_REG and REG_SEQUENCE. Reviewers: qcolombet, MatzeB Reviewed By: MatzeB Subscribers: nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D39696 llvm-svn: 317513
* [InstCombine] Pull shifts through a select plus binop with constantCraig Topper2017-11-061-27/+82
| | | | | | | | | | This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select. This can allow us to get the select closer to other selects to enable removing one. Differential Revision: https://reviews.llvm.org/D39222 llvm-svn: 317510
* Fix buildbot breakages from r317503. Add parentheses to assignment when ↵Graham Yiu2017-11-061-2/+2
| | | | | | using result as a condition. llvm-svn: 317508
* Adds code to PPC ISEL lowering to recognize byte inserts from ↵Graham Yiu2017-11-063-3/+117
| | | | | | | | vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word. Differential Revision: https://reviews.llvm.org/D34497 llvm-svn: 317503
* Include already promoted counts when computing SUM for VP.Dehao Chen2017-11-061-14/+12
| | | | | | | | | | | | | | Summary: When computing the SUM for indirect call promotion, if the callsite is already promoted in the profile, it will be promoted before ICP. In the current implementation, ICP only sees remaining counts in SUM. This may cause extra indirect call targets being promoted. This patch updates the SUM to include the counts already promoted earlier. This way we do not end up promoting too many indirect call targets. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38763 llvm-svn: 317502
* [PPC] Use xxbrd to speed up bswap64Guozhi Wei2017-11-062-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64. rotldi 5, 3, 16 rotldi 4, 3, 8 rotldi 9, 3, 24 rotldi 10, 3, 32 rotldi 11, 3, 48 rotldi 12, 3, 56 rldimi 4, 5, 8, 48 rldimi 4, 9, 16, 40 rldimi 4, 10, 24, 32 rldimi 4, 11, 40, 16 rldimi 4, 12, 48, 8 rldimi 4, 3, 56, 0 But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to: mtvsrdd 34, 3, 3 xxbrd 34, 34 mfvsrld 3, 34 Differential Revision: https://reviews.llvm.org/D39510 llvm-svn: 317499
* AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32Matt Arsenault2017-11-065-13/+90
| | | | llvm-svn: 317492
* [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' ↵Sanjay Patel2017-11-0619-67/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488
* [X86][SSE] Merge combineExtractVectorElt_SSE into combineExtractVectorElt. NFCI.Simon Pilgrim2017-11-061-12/+8
| | | | | | We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch. llvm-svn: 317485
* [X86][SSE] Combine EXTRACT_VECTOR_ELT with combineExtractWithShuffle before ↵Simon Pilgrim2017-11-061-2/+2
| | | | | | | | | | XFormVExtractWithShuffleIntoLoad combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad. Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit. llvm-svn: 317481
* [AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environmentYaxun Liu2017-11-061-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D39657 llvm-svn: 317479
* [SystemZ] implement hasDivRemOp()Jonas Paulsson2017-11-062-0/+6
| | | | | | | | | SystemZ can do division and remainder in a single instruction for scalar integer types, which are now reflected by returning true in this hook for those cases. Review: Ulrich Weigand llvm-svn: 317477
* [AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bitYaxun Liu2017-11-061-5/+10
| | | | | | | | | | | | The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 llvm-svn: 317476
* [mips] Add movep for microMIPS32R6 and fix microMIPS32r3 versionSimon Dardis2017-11-068-5/+51
| | | | | | | | | | | | | | | | | | | Previously, the 'movep' instruction was defined for microMIPS32r3 and shared that definition with microMIPS32R6. 'movep' was re-encoded for microMIPS32r6, so this patch provides the correct encoding. Secondly, correct the encoding of the 'rs' and 'rt' operands which have an instruction specific encoding for the registers those operands accept. Finally, correct the decoding of the 'dst_regs' operand which was extracting the relevant field from the instruction, but was actually extracting the field from the alreadly extracted field. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39495 llvm-svn: 317475
* [LV][X86] update the cost of interleaving mem. access of floatsMohammed Agabaria2017-11-061-1/+4
| | | | | | | | | | Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471
* [mips] Fix PR35140Simon Dardis2017-11-061-4/+4
| | | | | | | | | | | | | | Mark all symbols involved with TLS relocations as being TLS symbols. This resolves PR35140. Thanks to Alex Crichton for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39591 llvm-svn: 317470
* [X86][AVX512] Improve lowering of AVX512 test intrinsicsUriel Korach2017-11-062-4/+20
| | | | | | | | | | | | Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465
* [X86] Replace duplicate function call with variable. NFCUriel Korach2017-11-061-2/+2
| | | | | | | | | | | | | Change from: if (N->getOperand(0).getValueType() == MVT::v8i32 || N->getOperand(0).getValueType() == MVT::v8f32) to: EVT OpVT = N->getOperand(0).getValueType(); if (OpVT == MVT::v8i32 || OpVT == MVT::v8f32) Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25 llvm-svn: 317464
* X86 ISel: Basic support for variable-index vector permutationsZvi Rackover2017-11-061-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Try to lower a BUILD_VECTOR composed of extract-extract chains that can be reasoned to be a permutation of a vector by indices in a non-constant vector. We saw this pattern created by ISPC, which resolts to creating it due to the requirement that shufflevector's mask operand be a *constant* vector. I didn't check this but we could possibly use this pattern for lowering the X86 permute C-instrinsics instead of llvm.x86 instrinsics. This change can be followed by more improvements: 1. Handle vectors with undef elements. 2. Utilize pshufb and zero-mask-blending to support more effiecient construction of vectors with constant-0 elements. 3. Use smaller-element vectors of same width, and "interpolate" the indices, when no native operation available. Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: chandlerc, DavidKreitzer Differential Revision: https://reviews.llvm.org/D39126 llvm-svn: 317463
* Revert "adding a pattern for broadcastm"Jina Nahias2017-11-061-2/+2
| | | | | | | This reverts commit r317457. Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91 llvm-svn: 317462
* [ObjectYAML] Map relocation types for COFF ARMNT and ARM64Martin Storsjo2017-11-061-0/+48
| | | | | | Differential Revision: https://reviews.llvm.org/D39668 llvm-svn: 317459
* [x86][AVX512] Lowering Broadcastm intrinsics to LLVM IRJina Nahias2017-11-063-18/+27
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458
* adding a pattern for broadcastmJina Nahias2017-11-061-2/+2
| | | | | Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 317457
* [X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.Craig Topper2017-11-062-39/+49
| | | | llvm-svn: 317454
* [X86] Add scalar FMA ISD nodes without rounding mode. NFCCraig Topper2017-11-065-37/+92
| | | | | | Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
* [X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics.Craig Topper2017-11-062-10/+19
| | | | | | Fixes PR35161. llvm-svn: 317445
* [PassManager, SimplifyCFG] Revert r316908 and r316869.David L. Jones2017-11-061-7/+2
| | | | | | These cause Clang to crash with a segfault. See PR35210 for details. llvm-svn: 317444
* [X86] Add missing predicate to a pattern. NFCCraig Topper2017-11-051-0/+2
| | | | | | Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order. llvm-svn: 317442
* [X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I ↵Craig Topper2017-11-052-25/+12
| | | | | | missed in r317413. llvm-svn: 317441
* [X86] Fix outdated comment. NFCCraig Topper2017-11-051-1/+1
| | | | llvm-svn: 317440
OpenPOWER on IntegriCloud