summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Fix __clang_call_termiante's argument for foreign exceptionsHeejin Ahn2019-08-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When exceptions are repeatedly thrown in the middle of handling another exception, we call `__clang_call_terminate` with the exception pointer (i32) as an argument. But in case of foreign exceptions, we don't have the pointer, so we call the function with 0. (This requires `__clang_call_terminate` can deal with 0 argument, which will be done later) But previously the 0 argument was not added as a `i32.const 0` but an immediate by mistake, causing the `call` instruction to take not an i32 but rather an exnref, because an `exnref` is left on top of the value stack if `br_on_exn` is not taken. ``` block i32 br_on_exn 0, __cpp_exception ;; exnref is on top of stack now i32.const 0 ;; This was missing! call __clang_call_terminate unreachable end call __clang_call_terminate ;; This takes i32 extracted by br_on_exn ``` Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65475 llvm-svn: 368527
* [LICM] Make Loop ICM profile awareWenlei He2019-08-111-17/+73
| | | | | | | | | | | | | | | | | | | | | Summary: Hoisting/sinking instruction out of a loop isn't always beneficial. Hoisting an instruction from a cold block inside a loop body out of the loop could hurt performance. This change makes Loop ICM profile aware - it now checks block frequency to make sure hoisting/sinking anly moves instruction to colder block. Test Plan: ninja check Reviewers: asbirlea, sanjoy, reames, nikic, hfinkel, vsk Reviewed By: asbirlea Subscribers: fhahn, vsk, davidxl, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65060 llvm-svn: 368526
* Revert "test commit"Wenlei He2019-08-111-2/+0
| | | | | | This reverts commit ad92a4a2769425ad0d39ac1dbb6282f6f51a1af7. llvm-svn: 368525
* test commitWenlei He2019-08-111-0/+2
| | | | llvm-svn: 368524
* [X86] Remove some more code from combineShuffle that is no longer needed ↵Craig Topper2019-08-111-47/+0
| | | | | | with widening legalization. llvm-svn: 368523
* [X86] Remove some code from combineShuffle that seems largely unnecessary ↵Craig Topper2019-08-111-60/+0
| | | | | | | | | | | with widening legalization. The test case that changed is probably better served through allowing combineTruncatedArithmetic to create narrow vectors. It also appears InstCombine would have simplified this test case to remove the zext and trunc anyway. llvm-svn: 368522
* [InstCombine][NFC] Use SimplifyAddInst() instead of ↵Roman Lebedev2019-08-101-2/+2
| | | | | | SimplifyBinOp(Instruction::BinaryOps::Add, ) llvm-svn: 368521
* [InstCombine] Shift amount reassociation in bittest: relax one-use check ↵Roman Lebedev2019-08-101-1/+11
| | | | | | | | | | when shifting constant If one of the values being shifted is a constant, since the new shift amount is known-constant, the new shift will end up being constant-folded so, we don't need that one-use restriction then. llvm-svn: 368519
* [InstCombine] Shift amount reassociation in bittest: drop pointless one-use ↵Roman Lebedev2019-08-101-2/+2
| | | | | | | | | | restriction That one-use restriction is not needed for correctness - we have already ensured that one of the shifts will go away, so we know we won't increase the instruction count. So there is no need for that restriction. llvm-svn: 368518
* [X86][SSE] Lower shuffle as ANY_EXTEND_VECTOR_INREGSimon Pilgrim2019-08-101-3/+3
| | | | | | | | | | On SSE41+ targets we always lower vector shuffles to ZERO_EXTEND_VECTOR_INREG, even if we don't need the extended bits. This patch relaxes this so that we lower to ANY_EXTEND_VECTOR_INREG if we can, meaning that shuffle combines have a better idea of what elements need to be kept zero. This helps the multiple reduction code as we can now combine away a lot more of the pack+extend codes. Differential Revision: https://reviews.llvm.org/D65741 llvm-svn: 368515
* [NFC][CodeGen] Modify the PI++ to ++PI in ↵Kang Zhang2019-08-101-1/+1
| | | | | | MachineBlockPlacement::optimizeBranches() llvm-svn: 368514
* [Reassociate] try harder to convert negative FP constants to positiveSanjay Patel2019-08-101-72/+116
| | | | | | | | | | | | | | | | | | | | | | This is an extension of a transform that tries to produce positive floating-point constants to improve canonicalization (and hopefully lead to more reassociation and CSE). The original patches were: D4904 D5363 (rL221721) But as the test diffs show, these were limited to basic patterns by walking from an instruction to its single user rather than recursively moving up the def-use sequence. No fast-math is required here because we're only rearranging implicit FP negations in intermediate ops. A motivating bug is: https://bugs.llvm.org/show_bug.cgi?id=32939 Differential Revision: https://reviews.llvm.org/D65954 llvm-svn: 368512
* [CodeGen] Do the Simple Early Return in block-placement pass to optimize the ↵Kang Zhang2019-08-101-0/+36
| | | | | | | | | | | | | | | | blocks Summary: In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 368509
* [X86] Match the IR pattern form movmsk on SSE1 only targets where v4i32 ↵Craig Topper2019-08-101-3/+22
| | | | | | | | | | | | | | | | | | | isn't legal Summary: This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that. This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type. Reviewers: spatel, RKSimon, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65689 llvm-svn: 368506
* [X86] Improve the diagnostic for larger than 4-bit immediate for ↵Craig Topper2019-08-103-3/+22
| | | | | | vpermil2pd/ps. Only allow MCConstantExprs. llvm-svn: 368505
* [X86] Fix stack probe issue on windows32.Luo, Yuanke2019-08-104-8/+26
| | | | | | | | | | | | | | | | | | | | | Summary: On windows if the frame size exceed 4096 bytes, compiler need to generate a call to _alloca_probe. X86CallFrameOptimization pass changes the reserved stack size and cause of stack probe function not be inserted. This patch fix the issue by detecting the call frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization. Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon Reviewed By: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65923 llvm-svn: 368503
* [MemDep] allow to select block-scan-limit when constructing ↵Fedor Sergeev2019-08-101-5/+8
| | | | | | | | | | | | | MemoryDependenceAnalysis Introducing non-global control for default block-scan-limit in MemDep analysis. Useful when there are many compilations per initialized LLVM instance (e.g. JIT). Reviewed By: asbirlea Tags: #llvm Differential Revision: https://reviews.llvm.org/D65806 llvm-svn: 368502
* cfi-icall: Allow the jump table to be optionally made non-canonical.Peter Collingbourne2019-08-093-90/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The default behavior of Clang's indirect function call checker will replace the address of each CFI-checked function in the output file's symbol table with the address of a jump table entry which will pass CFI checks. We refer to this as making the jump table `canonical`. This property allows code that was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address of a function, but it comes with a couple of caveats that are especially relevant for users of cross-DSO CFI: - There is a performance and code size overhead associated with each exported function, because each such function must have an associated jump table entry, which must be emitted even in the common case where the function is never address-taken anywhere in the program, and must be used even for direct calls between DSOs, in addition to the PLT overhead. - There is no good way to take a CFI-valid address of a function written in assembly or a language not supported by Clang. The reason is that the code generator would need to insert a jump table in order to form a CFI-valid address for assembly functions, but there is no way in general for the code generator to determine the language of the function. This may be possible with LTO in the intra-DSO case, but in the cross-DSO case the only information available is the function declaration. One possible solution is to add a C wrapper for each assembly function, but these wrappers can present a significant maintenance burden for heavy users of assembly in addition to adding runtime overhead. For these reasons, we provide the option of making the jump table non-canonical with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump table is made non-canonical, symbol table entries point directly to the function body. Any instances of a function's address being taken in C will be replaced with a jump table address. This scheme does have its own caveats, however. It does end up breaking function address equality more aggressively than the default behavior, especially in cross-DSO mode which normally preserves function address equality entirely. Furthermore, it is occasionally necessary for code not compiled with ``-fsanitize=cfi-icall`` to take a function address that is valid for CFI. For example, this is necessary when a function's address is taken by assembly code and then called by CFI-checking C code. The ``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make the jump table entry of a specific function canonical so that the external code will end up taking a address for the function that will pass CFI checks. Fixes PR41972. Differential Revision: https://reviews.llvm.org/D65629 llvm-svn: 368495
* [DAGCombiner] exclude x*2.0 from normal negation profitability rulesSanjay Patel2019-08-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | This is the codegen part of fixing: https://bugs.llvm.org/show_bug.cgi?id=32939 Even with the optimal/canonical IR that is ideally created by D65954, we would reverse that transform in DAGCombiner and end up with the same asm on AArch64 or x86. I see 2 options for trying to correct this: 1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch). 2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul() that matches the transform done by DAGCombiner. This seems like the less intrusive patch, but if there's some other reason to prefer 1 option over the other, we can change to the other option. Differential Revision: https://reviews.llvm.org/D66016 llvm-svn: 368490
* [globalisel] Add G_SEXT_INREGDaniel Sanders2019-08-0910-4/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
* Remove variable only used in an assert.Eric Christopher2019-08-091-2/+1
| | | | llvm-svn: 368486
* [X86] Remove custom handling for extloads from LowerLoad.Craig Topper2019-08-091-183/+1
| | | | | | We don't appear to need this with widening legalization. llvm-svn: 368479
* [CodeGen] Require a name for a block addr targetBill Wendling2019-08-091-1/+1
| | | | | | | | | | | | | | | | | | | Summary: A block address may be used in inline assembly. In which case it requires a name so that the asm parser has something to parse. Creating a name for every block address is a large hammer, but is necessary because at the point when a temp symbol is created we don't necessarily know if it's used in inline asm. This ensures that it exists regardless. Reviewers: nickdesaulniers, craig.topper Subscribers: nathanchance, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65352 llvm-svn: 368478
* [MC] Don't recreate a label if it's already usedBill Wendling2019-08-093-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch keeps track of MCSymbols created for blocks that were referenced in inline asm. It prevents creating a new symbol which doesn't refer to the block. Inline asm may have a reference to a label. The asm parser however doesn't recognize it as a label and tries to create a new symbol. The result being that instead of the original symbol (e.g. ".Ltmp0") the parser replaces it in the inline asm with the new one (e.g. ".Ltmp00") without updating it in the symbol table. So the machine basic block retains the "old" symbol (".Ltmp0"), but the inline asm uses the new one (".Ltmp00"). Reviewers: nickdesaulniers, craig.topper Subscribers: nathanchance, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65304 llvm-svn: 368477
* [InstCombine] Refactor optimizeExp2() (NFC)Evandro Menezes2019-08-091-31/+19
| | | | | | | Refactor `LibCallSimplifier::optimizeExp2()` to use the new `emitBinaryFloatFnCall()` version that fetches the function name from TLI. llvm-svn: 368457
* [Transforms] Add a emitBinaryFloatFnCall() version that fetches the function ↵Evandro Menezes2019-08-091-9/+35
| | | | | | | | | | name from TLI Add the counterpart to a similar function for single operands. Differential revision: https://reviews.llvm.org/D65976 llvm-svn: 368453
* Print reasonable representations of type names in llvm-nm, readelf and readobjSunil Srivastava2019-08-091-1/+10
| | | | | | | | | | | For type values that do not have proper names, print reasonable representation in llvm-nm, llvm-readobj and llvm-readelf, matching GNU tools.s Fixes PR41713. Differential Revision: https://reviews.llvm.org/D65537 llvm-svn: 368451
* [Transforms] Rename hasUnaryFloatFn() and getUnaryFloatFn() (NFC)Evandro Menezes2019-08-093-23/+19
| | | | | | Rename `hasUnaryFloatFn()` to `hasFloatFn()` and `getUnaryFloatFn()` to `getFloatFnName()`. llvm-svn: 368449
* [DAGCombiner] remove redundant fold for X*1.0; NFCSanjay Patel2019-08-091-4/+0
| | | | | | | | | This is handled at node creation time (similar to X/1.0) after: rL357029 (no fast-math-flags needed) llvm-svn: 368443
* [MachinePipeliner] Avoid indeterminate order in FuncUnitSorterJinsong Ji2019-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is exposed by adding a new testcase in PowerPC in https://reviews.llvm.org/rL367732 The testcase got different output on different platform, hence breaking buildbots. The problem is that we get differnt FuncUnitOrder when calculateResMII. The root cause is: 1. Two MachineInstr might get SAME priority(MFUsx) from minFuncUnits. 2. Current comparison operator() will return `MFUs1 > MFUs2`. 3. We use iterators for MachineInstr, so the input to FuncUnitSorter might be different on differnt platform due to the iterator nature. So for two MI with same MFU, their order is actually depends on the iterator order, which is platform (implemtation) dependent. This is risky, and may cause cross-compiling problems. The fix is to check make sure we assign a determine order when they are equal. Reviewers: bcahoon, hfinkel, jmolloy Subscribers: nemanjai, hiraditya, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65992 llvm-svn: 368441
* Title: Loop Cache AnalysisWhitney Tsang2019-08-094-0/+628
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Implement a new analysis to estimate the number of cache lines required by a loop nest. The analysis is largely based on the following paper: Compiler Optimizations for Improving Data Locality By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf The analysis considers temporal reuse (accesses to the same memory location) and spatial reuse (accesses to memory locations within a cache line). For simplicity the analysis considers memory accesses in the innermost loop in a loop nest, and thus determines the number of cache lines used when the loop L in loop nest LN is placed in the innermost position. The result of the analysis can be used to drive several transformations. As an example, loop interchange could use it determine which loops in a perfect loop nest should be interchanged to maximize cache reuse. Similarly, loop distribution could be enhanced to take into consideration cache reuse between arrays when distributing a loop to eliminate vectorization inhibiting dependencies. The general approach taken to estimate the number of cache lines used by the memory references in the inner loop of a loop nest is: Partition memory references that exhibit temporal or spatial reuse into reference groups. For each loop L in the a loop nest LN: a. Compute the cost of the reference group b. Compute the 'cache cost' of the loop nest by summing up the reference groups costs For further details of the algorithm please refer to the paper. Authored By: etiotto Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet, fhahn Reviewed By: Meinersbur Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595, venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO, Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D63459 llvm-svn: 368439
* [X86][SSE] Swap X86ISD::BLENDV inputs with an inverted selection mask (PR42825)Simon Pilgrim2019-08-091-0/+6
| | | | | | | | As discussed on PR42825, if we are inverting the selection mask we can just swap the inputs and avoid the inversion. Differential Revision: https://reviews.llvm.org/D65522 llvm-svn: 368438
* [GlobalOpt] prevent crashing on large integer types (PR42932)Sanjay Patel2019-08-091-2/+4
| | | | | | | | | | | This is a minimal fix (copy the predicate for the assert) to prevent the crashing seen in: https://bugs.llvm.org/show_bug.cgi?id=42932 ...when converting a constant integer of arbitrary width to uint64_t. Differential Revision: https://reviews.llvm.org/D65970 llvm-svn: 368437
* [Mips][Codegen] Fix fast-isel mixing of FGR64 and AFGR64 registersSimon Atanasyan2019-08-091-2/+8
| | | | | | | | | | | | Fast-isel was picking AFGR64 register class for processing call arguments when +fp64 options was used. We simply check is option +fp64 is used and pick appropriate register. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D65886 llvm-svn: 368433
* [MCA] Add flag -show-encoding to llvm-mca.Andrea Di Biagio2019-08-092-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flag -show-encoding enables the printing of instruction encodings as part of the the instruction info view. Example (with flags -mtriple=x86_64-- -mcpu=btver2): Instruction Info: [1]: #uOps [2]: Latency [3]: RThroughput [4]: MayLoad [5]: MayStore [6]: HasSideEffects (U) [7]: Encoding Size [1] [2] [3] [4] [5] [6] [7] Encodings: Instructions: 1 2 1.00 4 c5 f0 59 d0 vmulps %xmm0, %xmm1, %xmm2 1 4 1.00 4 c5 eb 7c da vhaddps %xmm2, %xmm2, %xmm3 1 4 1.00 4 c5 e3 7c e3 vhaddps %xmm3, %xmm3, %xmm4 In this example, column Encoding Size is the size in bytes of the instruction encoding. Column Encodings reports the actual instruction encodings as byte sequences in hex (objdump style). The computation of encodings is done by a utility class named mca::CodeEmitter. In future, I plan to expose the CodeEmitter to the instruction builder, so that information about instruction encoding sizes can be used by the simulator. That would be a first step towards simulating the throughput from the decoders in the hardware frontend. Differential Revision: https://reviews.llvm.org/D65948 llvm-svn: 368432
* [AArch64] Set pref. func. align to 8 bytes on Neoverse E1 & Cortex-A65Pablo Barrio2019-08-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The Arm Neoverse E1 and Cortex-A65 Software Optimization Guide [1][2], Section "4.7 Branch instruction alignment" state: "It is preferable for branch targets, including subroutine entry points, to be placed on aligned 64-bit boundaries to maximize instruction fetch efficiency." This patch sets the preferred function alignment on Neoverse E1 and Cortex-A65 to 2^3=8B. This was already the case in some Cortex-A CPUs such as Cortex-A53. [1] https://developer.arm.com/docs/swog466751/latest/arm-neoversetm-e1-core-software-optimization-guide [2] https://developer.arm.com/docs/swog010045/latest/arm-cortex-a65-core-software-optimization-guide Reviewers: dmgreen, fhahn, samparker Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65937 llvm-svn: 368431
* AArch64: support TLS on Darwin platforms in GlobalISel.Tim Northover2019-08-091-4/+34
| | | | | | | All TLS access on Darwin is in the "general dynamic" form where we call a function to resolve the address, so implementation is pretty simple. llvm-svn: 368418
* GlobalISel: pack various parameters for lowerCall into a struct.Tim Northover2019-08-0910-115/+94
| | | | | | | | | I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408
* [ARM][ParallelDSP] Replace SExt usesSam Parker2019-08-091-3/+5
| | | | | | | | | | | As loads are combined and widened, we replaced their sext users operands whereas we should have been replacing the uses of the sext. I've added a load of tests, with only a few of them originally causing assertion failures, the rest improve pattern coverage. Differential Revision: https://reviews.llvm.org/D65740 llvm-svn: 368404
* [InstSimplify] Report "Changed" also when only deleting dead instructionsBjorn Pettersson2019-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Make sure that we report that changes has been made by InstSimplify also in situations when only trivially dead instructions has been removed. If for example a call is removed the call graph must be updated. Bug seem to have been introduced by llvm-svn r367173 (commit 02b9e45a7e4b81), since the code in question was rewritten in that commit. Reviewers: spatel, chandlerc, foad Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65973 llvm-svn: 368401
* [X86] Remove code that expands truncating stores from combineStore.Craig Topper2019-08-091-76/+1
| | | | | | | We shouldn't form trunc stores that need to be expanded now that we are using widening legalization. llvm-svn: 368400
* [X86] Remove stale FIXME from combineMaskedStore. NFCCraig Topper2019-08-091-4/+0
| | | | | | | I believe PR34584 was tracking that FIXME, but its since been closed and a test case was added. llvm-svn: 368397
* [X86] Remove DAG combine expansion of extending masked load and truncating ↵Craig Topper2019-08-091-181/+24
| | | | | | | | | | masked store. The only way to generate these was through promoting legalization of narrow vectors, but we widen those types now. So we shouldn't produce these nodes. llvm-svn: 368396
* [X86] Remove handler for (U/S)(ADD/SUB)SAT from ReplaceNodeResults. Remove ↵Craig Topper2019-08-091-9/+4
| | | | | | | | TypeWidenVector check from code that handles X86ISD::VPMADDWD and X86ISD::AVG. More unneeded code since we now legalize narrow vectors by widening. llvm-svn: 368395
* [X86] Remove ISD::SETCC handling from ReplaceNodeResults.Craig Topper2019-08-091-27/+0
| | | | | | This is no longer needed since we widen v2i32 instead of promoting. llvm-svn: 368394
* [X86] Simplify ISD::LOAD handling in ReplaceNodeResults and ISD::STORE ↵Craig Topper2019-08-091-12/+10
| | | | | | handling in LowerStore now that v2i32 is widened to v4i32. llvm-svn: 368390
* [X86] Merge v2f32 and v2i32 gather/scatter handling in ↵Craig Topper2019-08-091-86/+12
| | | | | | ReplaceNodeResults/LowerMSCATTER now that v2i32 is also widened like v2f32. llvm-svn: 368389
* [X86] Now unreachable handling for f64->v2i32/v4i16/v8i8 bitcasts from ↵Craig Topper2019-08-091-14/+0
| | | | | | | | ReplaceNodeResults. We rely on the generic type legalizer for this now. llvm-svn: 368388
* [X86] Simplify ReplaceNodeResults handling for FP_TO_SINT/UINT for vectors ↵Craig Topper2019-08-091-44/+10
| | | | | | to only handle widening. llvm-svn: 368387
* [X86] Simplify ReplaceNodeResults handling for ↵Craig Topper2019-08-091-4/+5
| | | | | | SIGN_EXTEND/ZERO_EXTEND/TRUNCATE for vectors to only handle widening. llvm-svn: 368386
OpenPOWER on IntegriCloud