summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombiner] re-enable truncation of binopsSanjay Patel2018-12-081-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. llvm-svn: 348706
* [WebAssembly] Make WasmSymbol's signature usable for events (NFC)Heejin Ahn2018-12-081-5/+7
| | | | | | | | | | | | | | Summary: WasmSignature used to use its `WasmSignature` member variable only for function types, but now it also can be used for events as well. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55247 llvm-svn: 348702
* [SelectionDAG] Remove ISD::ADDC/ADDE from some undef handling code in ↵Craig Topper2018-12-081-2/+0
| | | | | | | | getNode. NFCI These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway. llvm-svn: 348670
* AMDGPU: Fix offsets for < 4-byte aggregate kernel argumentsMatt Arsenault2018-12-071-4/+7
| | | | | | | | We were still using the rounded down offset and alignment even though they aren't handled because you can't trivially bitcast the loaded value. llvm-svn: 348658
* [GlobalISel] Add IR translation support for the @llvm.log10 intrinsicJessica Paquette2018-12-071-0/+5
| | | | | | | | This adds IR translation support for @llvm.log10 and updates relevant tests. https://reviews.llvm.org/D55392 llvm-svn: 348657
* [Hexagon] Fix post-ra expansion of PS_wselectKrzysztof Parzyszek2018-12-071-1/+0
| | | | llvm-svn: 348655
* [ModuleSummary] use StringRefs to avoid a redundant copy; NFCGeorge Burgess IV2018-12-071-1/+1
| | | | | | | | `Saver` is a StringSaver, which has a few overloads of `save` that all ultimately just call `StringRef save(StringRef)`. Just take a StringRef here instead of building up a std::string to convert it to a StringRef. llvm-svn: 348650
* Fix unused variable warning. NFCI.Simon Pilgrim2018-12-071-2/+2
| | | | llvm-svn: 348649
* [WebAssembly] clang-format/clang-tidy AsmParser (NFC)Heejin Ahn2018-12-071-60/+88
| | | | | | | | | | | | | | | | | | | Summary: - LLVM clang-format style doesn't allow one-line ifs. - LLVM clang-tidy style says method names should start with a lowercase letter. But currently WebAssemblyAsmParser's parent class MCTargetAsmParser is mixing lowercase and uppercase method names itself so overridden methods cannot be renamed now. - Changed else ifs after returns to ifs. - Added some newlines for readability. Reviewers: aardappel, sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55350 llvm-svn: 348648
* Delete registerScope functionHeejin Ahn2018-12-071-20/+2
| | | | | | `unregisterScope()` is not currently used, so removing it. llvm-svn: 348647
* Follow-up from r348441 to add the rest of the objc ARC intrinsics.Pete Cooper2018-12-071-0/+14
| | | | | | This adds the other intrinsics used by ARC and codegen's them to their respective runtime methods. llvm-svn: 348646
* [MemCpyOpt] memset->memcpy forwarding with undef tailNikita Popov2018-12-072-16/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 348645
* [HotColdSplitting] Refine definition of unlikelyExecutedVedant Kumar2018-12-071-24/+16
| | | | | | | | | | | | | | | | | | | | | | | The splitting pass uses its 'unlikelyExecuted' predicate to statically decide which blocks are cold. - Do not treat noreturn calls as if they are cold unless they are actually marked cold. This is motivated by functions like exit() and longjmp(), which are not beneficial to outline. - Do not treat inline asm as an outlining barrier. In practice asm("") is frequently used to inhibit basic block merging; enabling outlining in this case results in substantial memory savings. - Treat invokes of cold functions as cold. As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's no longer needed. The pass can identify & outline blocks dominated by EH pads, so there's no need to special-case __cxa_begin_catch etc. Differential Revision: https://reviews.llvm.org/D54244 llvm-svn: 348640
* [HotColdSplitting] Outline more than once per functionVedant Kumar2018-12-071-165/+287
| | | | | | | | | | | | | | | | | | Algorithm: Identify maximal cold regions and put them in a worklist. If a candidate region overlaps with another, discard it. While the worklist is full, remove a single-entry sub-region from the worklist and attempt to outline it. By the non-overlap property, this should not invalidate parts of the domtree pertaining to other outlining regions. Testing: LNT results on X86 are clean. With test-suite + externals, llvm outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file 483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm outlines over 100 times in some functions (mostly EH paths). There was not a significant performance impact pre vs. post-patch. Differential Revision: https://reviews.llvm.org/D53887 llvm-svn: 348639
* [NativePDB] Reconstruct function declarations from debug info.Zachary Turner2018-12-071-1/+12
| | | | | | | | | | | | Previously we would create an lldb::Function object for each function parsed, but we would not add these to the clang AST. This is a first step towards getting local variable support working, as we first need an AST decl so that when we create local variable entries, they have the proper DeclContext. Differential Revision: https://reviews.llvm.org/D55384 llvm-svn: 348631
* [llvm-tapi] Don't try to override SequenceTraits for std::stringSam Clegg2018-12-071-17/+0
| | | | | | | | | | | | | | | | For some reason this doesn't seem to work with LLVM_LINK_LLVM_DYLIB build. See https://logs.chromium.org/logs/chromium/bb/client.wasm.llvm/linux/37764/+/recipes/steps/LLVM_regression_tests/0/stdout What is more it seems that overriding these traits for core types (including std::string) is not supported/recommend by YAMLTraits.h. See line 1918 which has the assertion: "only use LLVM_YAML_IS_SEQUENCE_VECTOR for types you control" Differential Revision: https://reviews.llvm.org/D55381 llvm-svn: 348630
* [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFCSanjay Patel2018-12-071-33/+48
| | | | | | | This duplicates several shared checks, but we need to split this up to fix underlying bugs in smaller steps. llvm-svn: 348627
* [X86] Replace instregex with instrs list. NFCI. Simon Pilgrim2018-12-073-3/+3
| | | | llvm-svn: 348626
* AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.loadMatt Arsenault2018-12-072-5/+12
| | | | llvm-svn: 348625
* [X86] Initialize and Register X86CondBrFoldingPassCraig Topper2018-12-073-4/+8
| | | | | | | | | | To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D55412 llvm-svn: 348620
* AMDGPU: Remove llvm.SI.tbuffer.storeMatt Arsenault2018-12-072-67/+0
| | | | llvm-svn: 348619
* [X86] Improve pfm counter coverage for llvm-exegesisSimon Pilgrim2018-12-071-0/+83
| | | | | | | | | | | | | | This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports. Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code). The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings. These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs. Differential Revision: https://reviews.llvm.org/D55432 llvm-svn: 348617
* AMDGPU: Remove llvm.SI.buffer.load.dwordMatt Arsenault2018-12-072-62/+0
| | | | llvm-svn: 348616
* AMDGPU: Remove llvm.AMDGPU.killMatt Arsenault2018-12-075-35/+5
| | | | | | This is the last of the old AMDGPU intrinsics. llvm-svn: 348615
* [DAGCombiner] disable truncation of binops by defaultSanjay Patel2018-12-071-1/+7
| | | | | | | | | | As discussed in the post-commit thread of r347917, this transform is fighting with an existing transform causing an infinite loop or out-of-memory, so this is effectively reverting r347917 and its follow-up r348195 while we investigate the bug. llvm-svn: 348604
* Reapply "[DemandedBits][BDCE] Support vectors of integers"Nikita Popov2018-12-072-27/+51
| | | | | | | | | | | | | | | | | | | | | | DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348602
* [AMDGPU] Shrink scalar AND, OR, XOR instructionsGraham Sellers2018-12-071-0/+84
| | | | | | | | | | | | | | | This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable. It performs: AND s0, s0, ~(1 << n) -> BITSET0 s0, n OR s0, s0, (1 << n) -> BITSET1 s0, n AND s0, s1, x -> ANDN2 s0, s1, ~x OR s0, s1, x -> ORN2 s0, s1, ~x XOR s0, s1, x -> XNOR s0, s1, ~x In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31). llvm-svn: 348601
* [DAGCombiner] remove explicit calls to AddToWorkList; NFCISanjay Patel2018-12-071-6/+0
| | | | | | | | As noted in the post-commit thread for rL347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html ...we don't need to repeat these calls because the combiner does it automatically. llvm-svn: 348597
* Introduce llvm.experimental.widenable_condition intrinsicMax Kazantsev2018-12-075-0/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new instinsic `@llvm.experimental.widenable_condition` that allows explicit representation for guards. It is an alternative to using `@llvm.experimental.guard` intrinsic that does not contain implicit control flow. We keep finding places where `@llvm.experimental.guard` is not supported or treated too conservatively, and there are 2 reasons to that: - `@llvm.experimental.guard` has memory write side effect to model implicit control flow, and this sometimes confuses passes and analyzes that work with memory; - Not all passes and analysis are aware of the semantics of guards. These passes treat them as regular throwing call and have no idea that the condition of guard may be used to prove something. One well-known place which had caused us troubles in the past is explicit loop iteration count calculation in SCEV. Another example is new loop unswitching which is not aware of guards. Whenever a new pass appears, we potentially have this problem there. Rather than go and fix all these places (and commit to keep track of them and add support in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible. The only significant difference between guards and regular explicit branches is that guard's condition can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor, and it is always legal to go there no matter what the guard condition is. The other successor is a guarded block, and it is only legal to go there if the condition is true. This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard` intrinsic. Now a widenable guard can be represented in the CFG explicitly like this: %widenable_condition = call i1 @llvm.experimental.widenable.condition() %new_condition = and i1 %cond, %widenable_condition br i1 %new_condition, label %guarded, label %deopt guarded: ; Guarded instructions deopt: call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ] The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an `undef`, but the intrinsic prevents the optimizer from folding it early. This form should exploit all optimization boons provided to `br` instuction, and it still can be widened by replacing the result of `@llvm.experimental.widenable.condition()` with `and` with any arbitrary boolean value (as long as the branch that is taken when it is `false` has a deopt and has no side-effects). For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using implicit control flow in guards". This patch introduces this new intrinsic with respective LangRef changes and a pass that converts old-style guards (expressed as intrinsics) into the new form. The naming discussion is still ungoing. Merging this to unblock further items. We can later change the name of this intrinsic. Reviewed By: reames, fedor.sergeev, sanjoy Differential Revision: https://reviews.llvm.org/D51207 llvm-svn: 348593
* ARM: use correct offset from base pointer (r6) in call frame regions.Tim Northover2018-12-071-0/+1
| | | | | | | | | When we had dynamic call frames (i.e. sp adjustment around each call) we were including that adjustment into offsets calculated based on r6, even though it's only sp that changes. This led to incorrect stack slot accesses. llvm-svn: 348591
* [Targets] Add errors for tiny and kernel codemodel on targets that don't ↵David Green2018-12-0719-119/+69
| | | | | | | | | | | support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 llvm-svn: 348585
* Fix gcc7.3 -Wparentheses warning. NFCI.Simon Pilgrim2018-12-071-3/+3
| | | | llvm-svn: 348581
* [X86] Add ivybridge to llvm-exegesis PFM counter mappingsSimon Pilgrim2018-12-071-0/+1
| | | | llvm-svn: 348575
* [SelectionDAG] Don't pass on DemandedElts when handling SCALAR_TO_VECTORSimon Pilgrim2018-12-071-1/+1
| | | | | | | | | | | | Fixes an assertion: llc: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2200: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `(!Op.getValueType().isVector() || NumElts == Op.getValueType().getVectorNumElements()) && "Unexpected vector size"' failed. Committed on behalf of: @pendingchaos (Rhys Perry) Differential Revision: https://reviews.llvm.org/D55223 llvm-svn: 348574
* [IR] Don't assume all functions are 4 byte alignedRanjeet Singh2018-12-071-4/+5
| | | | | | | | | | | | | | | | | In some cases different alignments for function might be used to save space e.g. thumb mode with -Oz will try to use 2 byte function alignment. Similar patch that fixed this in other areas exists here https://reviews.llvm.org/D46110 This was approved previously https://reviews.llvm.org/D55115 (r348215) but when committed it caused failures on the sanitizer buildbots when building llvm with clang (containing this patch). This is now fixed because I've added a check to see if getting the parent module returns null if it does then set the alignment to 0. Differential Revision: https://reviews.llvm.org/D55115 llvm-svn: 348571
* [PM] Port LoadStoreVectorizer to the new pass manager.Markus Lavin2018-12-074-16/+36
| | | | | | Differential Revision: https://reviews.llvm.org/D54848 llvm-svn: 348570
* [LoopSimplifyCFG] Do not deal with loops with irreducible CFG insideMax Kazantsev2018-12-071-0/+40
| | | | | | | | | | | | | | | | | | | | | The current algorithm that collects live/dead/inloop blocks relies on some invariants related to RPO and PO traversals. In particular, the important fact it requires is that the only loop's latch is the first block in PO traversal. It also relies on fact that during RPO we visit all prececessors of a block before we visit this block (backedges ignored). If a loop has irreducible non-loop cycle inside, both these assumptions may break. This patch adds detection for this situation and prohibits the terminator folding for loops with irreducible CFG. We can in theory support this later, for this some algorithmic changes are needed. Besides, irreducible CFG is not a frequent situation and we can just don't bother. Thanks @uabelho for finding this! Differential Revision: https://reviews.llvm.org/D55357 Reviewed By: skatkov llvm-svn: 348567
* [PowerPC] Fix assert from machine verify pass that missing undef register flagZi Xuan Wu2018-12-071-15/+11
| | | | | | | | | | | | | | | | | | | | Fix assert about using an undefined physical register in machine instruction verify pass. The reason is that register flag undef is missing when doing transformation from If Conversion Pass. ``` Bad machine code: Using an undefined physical register - function: func_65 - basic block: %bb.0 entry (0x10024740738) - instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3 - operand 0: killed $cr5lt LLVM ERROR: Found 1 machine code errors. ``` There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying. Differential Revision: https://reviews.llvm.org/D55408 llvm-svn: 348566
* [CodeExtractor] Store outputs at the first valid insertion pointVedant Kumar2018-12-071-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | When CodeExtractor outlines values which are used by the original function, it must store those values in some in-out parameter. This store instruction must not be inserted in between a PHI and an EH pad instruction, as that results in invalid IR. This fixes the following verifier failure seen while outlining within ObjC methods with live exit values: The unwind destination does not have an exception handling instruction! %call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1) to label %invoke.cont34 unwind label %lpad33, !dbg !4183 The unwind destination does not have an exception handling instruction! invoke void @objc_exception_throw(i8* %call35) #12 to label %invoke.cont36 unwind label %lpad33, !dbg !4184 LandingPadInst not the first non-PHI instruction in the block. %3 = landingpad { i8*, i32 } catch i8* null, !dbg !1411 rdar://46540815 llvm-svn: 348562
* Revert "[llvm-tapi] Don't override SequenceTraits for std::string"Armando Montanez2018-12-071-17/+12
| | | | | | Revert r348551 since it triggered some warnings that don't appear to have a quick fix. llvm-svn: 348560
* Revert "[DemandedBits][BDCE] Support vectors of integers"Nikita Popov2018-12-072-51/+28
| | | | | | | This reverts commit r348549. Causing assertion failures during clang build. llvm-svn: 348558
* [DAGCombiner] use root SDLoc for all nodes created by logic foldSanjay Patel2018-12-071-1/+1
| | | | | | | | | | | If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552
* [llvm-tapi] Don't override SequenceTraits for std::stringArmando Montanez2018-12-061-12/+17
| | | | | | | | | | | | | | Change the ELF YAML implementation of TextAPI so NeededLibs uses flow sequence vector correctly instead of overriding the YAML implementation for std::vector<std::string>>. This should fix the test failure with the LLVM_LINK_LLVM_DYLIB build mentioned in D55381. Still passes existing tests that cover this. Differential Revision: https://reviews.llvm.org/D55390 llvm-svn: 348551
* [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCISanjay Patel2018-12-061-1/+1
| | | | | | | | | | | | | | We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold. llvm-svn: 348550
* [DemandedBits][BDCE] Support vectors of integersNikita Popov2018-12-062-28/+51
| | | | | | | | | | | | | | | | | | | DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. The getDemandedBits() method can now only be called on instructions that have integer or vector of integer type. Previously it could be called on any sized instruction (even if it was not particularly useful). The size of the return value is now always the scalar size in bits (while previously it was the type size in bits). Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348549
* [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFCSanjay Patel2018-12-061-41/+34
| | | | | | This code can still misbehave. llvm-svn: 348547
* [X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and ↵Craig Topper2018-12-062-47/+8
| | | | | | | | | | | | SETCC_CARRY, 1) This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused. I'm trying to decide if we need SETCC_CARRY. This removes one of its usages. Differential Revision: https://reviews.llvm.org/D55355 llvm-svn: 348536
* [DAGCombiner] don't group bswap with casts in logic hoisting foldSanjay Patel2018-12-061-6/+15
| | | | | | | | | | | | | | | | This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518 llvm-svn: 348534
* [DAGCombiner] reduce indent; NFCSanjay Patel2018-12-061-38/+31
| | | | | | | | Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place. llvm-svn: 348523
* [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.Andrea Di Biagio2018-12-061-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes: concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A) This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case posted by Craig; that particular case would probably require a more complicated approach (and knowledge about used bits). Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx): vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vxorps %xmm1, %xmm1, %xmm1 vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3] vmovaps %ymm0, (%rsi) vzeroupper retq Now we generate this: vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vmovaps %ymm0, (%rsi) vzeroupper retq As a side note: that VZEROUPPER is completely redundant... I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on %-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion %pass is disabled. Differential Revision: https://reviews.llvm.org/D55274 llvm-svn: 348522
OpenPOWER on IntegriCloud