summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Ensure nested shifts are in range (OSS-Fuzz #9880)Simon Pilgrim2018-11-061-5/+6
| | | | llvm-svn: 346225
* [Support] Fix `warning: unknown pragma ignored` for mingw targetMartin Storsjo2018-11-061-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D54133 llvm-svn: 346218
* [NFC] Turn collectTransitivePredecessors into a static functionMax Kazantsev2018-11-061-2/+5
| | | | llvm-svn: 346217
* [XRay] Update XRayRecord to support Custom/Typed EventsDean Michael Berris2018-11-064-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change cuts across LLVM and compiler-rt to add support for rendering custom events in the XRayRecord type, to allow for including user-provided annotations in the output YAML (as raw bytes). This work enables us to add custom event and typed event records into the `llvm::xray::Trace` type for user-provided events. This can then be programmatically handled through the C++ API and can be included in some of the tooling as well. For now we support printing the raw data we encounter in the custom events in the converted output. Future work will allow us to start interpreting these custom and typed events through a yet-to-be-defined API for extending the trace analysis library. Reviewers: mboerger Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54139 llvm-svn: 346214
* [LICM] Remove too conservative IsMustExecute variableMax Kazantsev2018-11-061-15/+8
| | | | | | | | | | | | | | | | | | LICM relies on variable `MustExecute` which is conservatively set to `false` in all non-headers. It is used when we decide whether or not we want to hoist an instruction or a guard. For the guards, it might be too conservative to use this variable, we can instead use a more precise logic from LoopSafetyInfo. Currently it is only NFC because `IsMemoryNotModified` is also conservatively set to `false` for all non-headers, and we cannot hoist guards from non-header blocks. However once we give up using `IsMemoryNotModified` and use a smarter check instead, this will allow us to hoist guards from all mustexecute non-header blocks. Differential Revision: https://reviews.llvm.org/D50888 Reveiwed By: fedor.sergeev llvm-svn: 346204
* AArch64: Cleanup CCMP code; NFCMatthias Braun2018-11-061-29/+30
| | | | | | | | | | Cleanup CCMP pattern matching code in preparation for review/bugfix: - Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()` (it won't accept arbitrary disjunctions and is really about whether we can transform the subtree into a conjunction that we can emit). - Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()` llvm-svn: 346203
* [LICM] Use ICFLoopSafetyInfo in LICMMax Kazantsev2018-11-061-20/+33
| | | | | | | | | | | | | | | This patch makes LICM use `ICFLoopSafetyInfo` that is a smarter version of LoopSafetyInfo that leverages power of Implicit Control Flow Tracking to keep track of throwing instructions and give less pessimistic answers to queries related to throws. The ICFLoopSafetyInfo itself has been introduced in rL344601. This patch enables it in LICM only. Differential Revision: https://reviews.llvm.org/D50377 Reviewed By: apilipenko llvm-svn: 346201
* Revert "[IndVars] Smart hard uses detection"Max Kazantsev2018-11-061-26/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402. It seems that the check that we still should do the transform if we know the result is constant is missing in this code. So the logic that has been deleted by this change is still sometimes accidentally useful. I revert the change to see what can be done about it. The motivating case is the following: @Y = global [400 x i16] zeroinitializer, align 1 define i16 @foo() { entry: br label %for.body for.body: ; preds = %entry, %for.body %i = phi i16 [ 0, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i store i16 0, i16* %arrayidx, align 1 %inc = add nuw nsw i16 %i, 1 %cmp = icmp ult i16 %inc, 400 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body %inc.lcssa = phi i16 [ %inc, %for.body ] ret i16 %inc.lcssa } We should be able to figure out that the result is constant, but the patch breaks it. Differential Revision: https://reviews.llvm.org/D51584 llvm-svn: 346198
* [LLVM-C] Fix Windows Build of CoreRobert Widmann2018-11-061-1/+1
| | | | | | | strndup doesn't exist outside of GNU-land and modern macOSes. Use strdup instead as c_str() is guaranteed to be NUL-terminated. llvm-svn: 346197
* [LLVM-C] Improve Intrinsics BindingsRobert Widmann2018-11-061-0/+44
| | | | | | | | | | | | | | | | | | | | | Summary: Improve the intrinsic bindings with operations for - Retrieving and automatically inserting the declaration of an intrinsic by ID - Retrieving the name of a non-overloaded intrinsic by ID - Retrieving the name of an overloaded intrinsic by ID and overloaded parameter types Improve the echo test to copy non-overloaded intrinsics by ID. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53626 llvm-svn: 346195
* Revert "[WebAssembly] Fixup `main` signature by default"Sam Clegg2018-11-061-1/+6
| | | | | | | | | | This reverts rL345880. It caused some test failures on the webassembly waterfall. e.g. binaryen2.test_mainenv fails due the fact that `envp` ends up being undef rather than 0. Differential Revision: https://reviews.llvm.org/D54117 llvm-svn: 346187
* MachineFunction: Store more specific reference to LLVMTargetMachine; NFCMatthias Braun2018-11-054-4/+5
| | | | | | | | | | MachineFunction can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. Do the same for references in ScheduleDAG and RegUsageInfoCollector. llvm-svn: 346183
* MachineModuleInfo: Store more specific reference to LLVMTargetMachine; NFCMatthias Braun2018-11-051-1/+1
| | | | | | | | MachineModuleInfo can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. llvm-svn: 346182
* [DWARF] Support types CU list in .gdb_index dumpingFangrui Song2018-11-051-3/+20
| | | | | | Some executables have non-empty types CU list and -gdb-index would report "<error reporting>" before. llvm-svn: 346181
* [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take ↵Craig Topper2018-11-0513-18/+17
| | | | | | | | an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180
* AMDGPU: Add sram-ecc featureKonstantin Zhuravlyov2018-11-057-21/+28
| | | | | | Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177
* [X86] Don't turn any_extend from a mask register into a sign_extend during ↵Craig Topper2018-11-052-2/+14
| | | | | | | | | | lowering. Add patterns to match any_extend during isel instead. SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel. I don't have a reduced test case for such an infinite loop yet. llvm-svn: 346170
* [InstSimplify] fold select (fcmp X, Y), X, YSanjay Patel2018-11-052-50/+31
| | | | | | | | | This is NFCI for InstCombine because it calls InstSimplify, so I left the tests for this transform there. As noted in the code comment, we can allow this fold more often by using FMF and/or value tracking. llvm-svn: 346169
* [COFF][LLD] Add link support for Microsoft precompiled headers OBJsAlexandre Ganea2018-11-053-37/+82
| | | | | | | | | | | This change allows for link-time merging of debugging information from Microsoft precompiled types OBJs compiled with cl.exe /Z7 /Yc and /Yu. This fixes llvm.org/PR34278 Differential Revision: https://reviews.llvm.org/D45213 llvm-svn: 346154
* Only call FlushFileBuffers() when writing executables on WindowsAlexandre Ganea2018-11-052-1/+57
| | | | | | | | | | | | This is a follow-up for "r325274: Call FlushFileBuffers on output files." Previously, FlushFileBuffers() was called in all cases when writing a file. The objective was to go around a bug in the Windows kernel (as described here: https://randomascii.wordpress.com/2018/02/25/compiler-bug-linker-bug-windows-kernel-bug/). However that is required only when writing EXEs, any other file type doesn't need flushing. This patch calls FlushFileBuffers() only for EXEs. In addition, we completly disable FlushFileBuffers() for known Windows 10 versions that do not exhibit the original kernel bug. Differential Revision: https://reviews.llvm.org/D53727 llvm-svn: 346152
* [MergeICmps] Do not perform the transformation if GEP is used outside of blockTaewook Oh2018-11-051-1/+1
| | | | | | | | | | | | | | | | | Summary: This patch prevents MergeICmps to performn the transformation if the address operand GEP of the load instruction has a use outside of the load's parent block. Without this patch, compiler crashes with the given test case because the use of `%first.i` is still around when the basic block is erased from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/MergeICmps.cpp#L620. I think checking `isUsedOutsideOfBlock` with `GEP` is the original intention of the code, as the checking for `LoadI` is already performed in the same function. This patch is incomplete though, as this makes the pass overly conservative and fails the test `tuple-four-int8.ll`. I believe what needs to be done is checking if GEP has a use outside of block that is not the part of "Comparisons" chain. Submit the patch as of now to prevent compiler crash. Reviewers: courbet, trentxintong Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54089 llvm-svn: 346151
* [Power9] Add support for stxvw4x.be and stxvd2x.be intrinsicsZaara Syeda2018-11-051-4/+4
| | | | | | | | | | | | On Power9, we don't have patterns to select the following intrinsics: llvm.ppc.vsx.stxvw4x.be llvm.ppc.vsx.stxvd2x.be This patch adds support for these. Differential Revision: https://reviews.llvm.org/D53581 llvm-svn: 346148
* [InstCombine] canonicalize -0.0 to +0.0 in fcmpSanjay Patel2018-11-051-0/+7
| | | | | | | | | | | | | | | | | As stated in IEEE-754 and discussed in: https://bugs.llvm.org/show_bug.cgi?id=38086 ...the sign of zero does not affect any FP compare predicate. Known regressions were fixed with: rL346097 (D54001) rL346143 The transform will help reduce pattern-matching complexity to solve: https://bugs.llvm.org/show_bug.cgi?id=39475 ...as well as improve CSE and codegen (a zero constant is almost always easier to produce than 0x80..00). llvm-svn: 346147
* [InstCombine] loosen FP 0.0 constraint for fcmp+select substitutionSanjay Patel2018-11-051-4/+13
| | | | | | | | | | | | | | It looks like we correctly removed edge cases with 0.0 from D50714, but we were a bit conservative because getBinOpIdentity() doesn't distinguish between +0.0 and -0.0 and 'nsz' is effectively always true for fcmp (see discussion in: https://bugs.llvm.org/show_bug.cgi?id=38086 Without this change, we would get regressions by canonicalizing to +0.0 in all fcmp, and that's a step towards solving: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346143
* [FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsicsCameron McInally2018-11-058-0/+60
| | | | | | Differential Revision: https://reviews.llvm.org/D53411 llvm-svn: 346141
* [ThinLTO] Add an option to disable (thin)lto internalization.Xin Tong2018-11-051-2/+8
| | | | | | | | | | | | | | | | | | | | | | Summary: LTO and ThinLTO optimizes the IR differently. One source of differences is the amount of internalizations that can happen. Add an option to enable/disable internalization so that other differences can be studied in isolation. e.g. inlining. There are other things lto and thinlto do differently, I will add flags to enable/disable them as needed. Reviewers: tejohnson, pcc, steven_wu Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D53294 llvm-svn: 346140
* [TargetLowering] Begin generalizing TargetLowering::expandFP_TO_SINT ↵Simon Pilgrim2018-11-051-26/+26
| | | | | | | | support. NFCI. Prior to initial work to add vector expansion support, remove assumptions that we're working on scalar types. llvm-svn: 346139
* [Inliner] Penalise inlining of calls with loops at OzDavid Green2018-11-051-0/+20
| | | | | | | | | | | | | | | | | | We currently seem to underestimate the size of functions with loops in them, both in terms of absolute code size and in the difficulties of dealing with such code. (Calls, for example, can be tail merged to further reduce codesize). At -Oz, we can then increase code size by inlining small loops multiple times. This attempts to penalise functions with loops at -Oz by adding a CallPenalty for each top level loop in the function. It uses LI (and hence DT) to calculate the number of loops. As we are dealing with minsize, the inline threshold is small and functions at this point should be relatively small, making the construction of these cheap. Differential Revision: https://reviews.llvm.org/D52716 llvm-svn: 346134
* [Mips] Supplement long branch pseudo instructionsStefan Maksimovic2018-11-055-11/+32
| | | | | | | | | | | Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo instructions by creating variants which support less operands/accept GPR64Opnds as their operand in order to appease the machine verifier pass. Differential Revision: https://reviews.llvm.org/D53977 llvm-svn: 346133
* [AMDGPU] Fix the new atomic optimizer in pixel shaders.Neil Henning2018-11-051-2/+39
| | | | | | | | | | | | | | | | | The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active. To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results. I've added a test case that can cause derivatives, and exposes the problem. Differential Revision: https://reviews.llvm.org/D53930 llvm-svn: 346128
* [ARM] Turn assert into condition in ARMCGPSam Parker2018-11-051-3/+3
| | | | | | | | | Turn the assert in PrepareConstants into a conditon so that we can handle mul instructions with negative immediates. Differential Revision: https://reviews.llvm.org/D54094 llvm-svn: 346126
* [ARM][ARMCGP] Remove unecessary zexts and truncsSam Parker2018-11-051-33/+68
| | | | | | | | | | | | | | | | | | | r345840 slightly changed the way promotion happens which could result in zext and truncs having the same source and destination types. This fixes that issue. We can now also remove the zext and trunc in the following case: (zext (trunc (promoted op)), i32) This means that we can no longer treat a value, that is only used by a sink, to be safe to promote. I've also added in some extra asserts and replaced a cast for a dyn_cast. Differential Revision: https://reviews.llvm.org/D54032 llvm-svn: 346125
* [DAGCombiner] Use tryFoldToZero to simplify some code and make it work ↵Craig Topper2018-11-051-8/+2
| | | | | | | | correctly between LegalTypes and LegalOperations. The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this. llvm-svn: 346119
* [DAGCombiner] Remove an unused argument from tryFoldToZero. NFCCraig Topper2018-11-051-4/+3
| | | | llvm-svn: 346118
* [AVR] Fix a backend bug that left extraneous operands after expansionDylan McKay2018-11-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug in the AVR FRMIDX expansion logic. The expansion would leave a leftover operand from the original FRMIDX, but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction did not expect this operand and so LLVM rejected the machine instruction. This would trigger an assertion: Assertion failed: ((isImpReg || Op.isRegMask() || MCID->isVariadic() || OpNo < MCID->getNumOperands() || isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"), function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp Tim fixed this so that now the FRMIDX is expanded correctly into a well-formed MOVWRdRr. Patch by Tim Neumann llvm-svn: 346117
* [X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq.Craig Topper2018-11-051-0/+42
| | | | | | v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about. llvm-svn: 346115
* [AVR] Disallow the LDDWRdPtrQ instruction with Z as the destinationDylan McKay2018-11-052-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an AVR-specific workaround for a limitation of the register allocator that only exposes itself on targets with high register contention like AVR, which only has three pointer registers. The three pointer registers are X, Y, and Z. In most nontrivial functions, Y is reserved for the frame pointer, as per the calling convention. This leaves X and Z. Some instructions, such as LPM ("load program memory"), are only defined for the Z register. Sometimes this just leaves X. When the backend generates a LDDWRdPtrQ instruction with Z as the destination pointer, it usually trips up the register allocator with this error message: LLVM ERROR: ran out of registers during register allocation This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction from ever using the Z register as an operand. This gives the register allocator a bit more space to allocate, fixing the regalloc exhaustion error. Here is a description from the patch author Peter Nimmervoll As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. More information about the bug can be found on the avr-rust issue tracker at https://github.com/avr-rust/rust/issues/37. A bug has raised to track the removal of this workaround and a proper fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553. Patch by Peter Nimmervoll llvm-svn: 346114
* [HotColdSplitting] Use TTI to inform outlining thresholdVedant Kumar2018-11-041-18/+26
| | | | | | | | | | | | | | | Using TargetTransformInfo allows the splitting pass to factor in the code size cost of instructions as it decides whether or not outlining is profitable. This did not regress the overall amount of outlining seen on the handful of internal frameworks I tested. Thanks to Jun Bum Lim for suggesting this! Differential Revision: https://reviews.llvm.org/D53835 llvm-svn: 346108
* [X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode.Craig Topper2018-11-041-0/+42
| | | | | | | | | | | | | | Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102
* [ValueTracking] determine sign of 0.0 from select when matching min/max FPSanjay Patel2018-11-041-0/+21
| | | | | | | | | | | | | | | | | | | In PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 ..we may fail to recognize/simplify fabs() in some cases because we do not canonicalize fcmp with a -0.0 operand. Adding that canonicalization can cause regressions on min/max FP tests, so that's this patch: for the purpose of determining whether something is min/max, let the value returned by the select determine how we treat a 0.0 operand in the fcmp. This patch doesn't actually change the -0.0 to +0.0. It just changes the analysis, so we don't fail to recognize equivalent min/max patterns that only differ in the signbit of 0.0. Differential Revision: https://reviews.llvm.org/D54001 llvm-svn: 346097
* [DAGCombiner] Remove 'else' after return. NFCCraig Topper2018-11-041-8/+7
| | | | | | This makes this code consistent with the nearly identical code in visitZERO_EXTEND. llvm-svn: 346090
* [SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG ↵Craig Topper2018-11-046-59/+43
| | | | | | | | | | nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087
* [X86] Update comment I forgot to change in r346043. NFCCraig Topper2018-11-031-2/+2
| | | | llvm-svn: 346073
* [ValueTracking] peek through 2-input shuffles in ComputeNumSignBitsSanjay Patel2018-11-031-19/+34
| | | | | | | | | | | This patch gives the IR ComputeNumSignBits the same functionality as the DAG version (the code is derived from the existing code). This an extension of the single input shuffle analysis added with D53659. Differential Revision: https://reviews.llvm.org/D53987 llvm-svn: 346071
* [codeview] Let the X86 backend tell us the VFRAME offset adjustmentReid Kleckner2018-11-033-17/+15
| | | | | | | | | | | Use MachineFrameInfo's OffsetAdjustment field to pass this information from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for any other purpose. This fixes PR38857 in the case where there is a non-aligned quantity of CSRs and a non-aligned quantity of locals. llvm-svn: 346062
* [DWARF v5] Verifier: Add checks for DW_FORM_strx* forms.Wolfgang Pieb2018-11-031-0/+39
| | | | | | | | | Adding functionality to the DWARF verifier for DWARF v5 strx* forms which index into the string offsets table. Differential Revision: https://reviews.llvm.org/D54049 llvm-svn: 346061
* [LTO] Fix a crash caused by accessing an empty ValueInfoTeresa Johnson2018-11-021-12/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ModuleSummaryIndex::exportToDot crashes when linking the Linux kernel under ThinLTO using LLVMgold.so. This is due to the exportToDot function trying to get the GUID of an empty ValueInfo. The root cause related to the fact that we attempt to get the GUID of an aliasee via its OriginalGUID recorded in the aliasee summary, and that is not always possible. Specifically, we cannot do this mapping when the value is internal linkage and there were other internal linkage symbols with the same name. There are 2 fixes for the problem included here. 1) In all cases where we can currently print the dot file from the command line (which is only via save-temps), we have a valid AliaseeGUID in the AliasSummary. Use that when it is available, so that we can get the correct aliasee GUID whenever possible. 2) However, if we were to invoke exportToDot from the debugger right after it is built during the initial analysis step (i.e. the per-module summary), we won't have the AliaseeGUID field populated. In that case, we have a fallback fix that will simply print "@"+GUID when we aren't able to get the GUID from the OriginalGUID. It simply checks if the VI is valid or not before attempting to get the name. Additionally, since getAliaseeGUID will assert that the AliaseeGUID is non-zero, guard the earlier fix #1 by a new function hasAliaseeGUID(). Reviewers: pcc, tmroeder Subscribers: evgeny777, mehdi_amini, inglorion, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53986 llvm-svn: 346055
* [X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly ↵Craig Topper2018-11-021-1/+1
| | | | | | | | using X86ISD::UNPCKL The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies. llvm-svn: 346050
* [WebAssembly] Parsing missing directives to produce valid .oWouter van Oortmerssen2018-11-022-33/+101
| | | | | | | | | | | | | | | | | | | | | | | Summary: The assembler was able to assemble and then dump back to .s, but was failing to parse certain directives necessary for valid .o output: - .type directives are now recognized to distinguish function symbols and others. - .size is now parsed to provide function size. - .globaltype (introduced in https://reviews.llvm.org/D54012) is now recognized to ensure symbols like __stack_pointer have a proper type set for both .s and .o output. Also added tests for the above. Reviewers: sbc100, dschuff Subscribers: jgravelle-google, aheejin, dexonsmith, kristina, llvm-commits, sunfish Differential Revision: https://reviews.llvm.org/D53842 llvm-svn: 346047
* [X86] Don't emit *_extend_vector_inreg nodes when both the input and output ↵Craig Topper2018-11-022-28/+19
| | | | | | | | | | | | | | | | types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043
OpenPOWER on IntegriCloud