summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [LV] Deny irregular types in interleavedAccessCanBeWidenedBjorn Pettersson2019-06-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 llvm-svn: 363547
* [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splittingLuis Marques2019-06-173-12/+68
| | | | | | | | | | | | | | | Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544
* Fix clang -Wcovered-switch-default after stack-id change by D60137Fangrui Song2019-06-171-8/+7
| | | | llvm-svn: 363543
* [SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v ↵Simon Pilgrim2019-06-171-0/+6
| | | | | | | | in getNode This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source. llvm-svn: 363542
* [SCEV] Use NoWrapFlags when expanding a simple mulSam Parker2019-06-171-2/+2
| | | | | | | | | | Second functional change following on from rL362687. Pass the NoWrapFlags from the MulExpr to InsertBinop when we're generating a shl or mul. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363540
* [ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds ↵Fangrui Song2019-06-171-1/+1
| | | | | | after D63265 llvm-svn: 363535
* [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265Fangrui Song2019-06-171-1/+1
| | | | llvm-svn: 363534
* Describe stack-id as an enumSander de Smalen2019-06-178-22/+44
| | | | | | | | | | | | | | | | | This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533
* [ARM] Remove ARMComputeBlockSizeSam Parker2019-06-171-80/+0
| | | | | | Forgot to remove file! llvm-svn: 363532
* [ARM] Add ARMBasicBlockInfo.cppSam Parker2019-06-171-0/+146
| | | | | | Forgot to add file! llvm-svn: 363531
* [ARM] Extract some code from ARMConstantIslandPassSam Parker2019-06-174-112/+101
| | | | | | | | | Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530
* Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: ↵Hans Wennborg2019-06-171-14/+15
| | | | | | | | | | | Also sink function calls without used results (PR41259)" Third time's the charm. This was reverted in r363220 due to being suspected of an internal benchmark regression and a test failure, none of which turned out to be caused by this. llvm-svn: 363529
* [SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch casesYevgeny Rouban2019-06-171-4/+5
| | | | | | | | | | | SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata if unreachable switch cases are removed. This patch fixes this bug by making use of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122). A new test is created. Differential Revision: https://reviews.llvm.org/D62186 llvm-svn: 363527
* PowerPC: Optimize SPE double parameter calling setupJustin Hibbits2019-06-176-49/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526
* [X86] Add TB_NO_REVERSE to some folding table entries where the register ↵Craig Topper2019-06-161-9/+9
| | | | | | | | | | | | | | from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523
* [InstSimplify] Fix addo/subo undef folds (PR42209)Roman Lebedev2019-06-161-8/+11
| | | | | | | | | | | | | | | | Fix folds of addo and subo with an undef operand to be: `@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`, as per LLVM undef rules. Same for commuted variants. Based on the original version of the patch by @nikic. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]] Differential Revision: https://reviews.llvm.org/D63065 llvm-svn: 363522
* AMDGPU: Prepare for explicit absolute relocations in code generationNicolai Haehnle2019-06-165-8/+28
| | | | | | | | | | | | | | | | | Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517
* AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0Nicolai Haehnle2019-06-163-10/+19
| | | | | | | | | | | | | | | | | | | | | Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516
* AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsicNicolai Haehnle2019-06-164-13/+22
| | | | | | | | | | | | | | | Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-1618-211/+599
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* [CodeGenPrepare][x86] shift both sides of a vector select when profitableSanjay Patel2019-06-161-3/+42
| | | | | | | | | | | | | | | | | This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511
* [x86] split 256-bit vector selects if operands are vector concatsSanjay Patel2019-06-161-0/+36
| | | | | | | | | | | | | | | | | | | This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508
* [X86] CombineShuffleWithExtract - handle cases with different vector extract ↵Simon Pilgrim2019-06-161-6/+28
| | | | | | | | sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507
* [X86] CombineShuffleWithExtract - assert all src ops types are multiples of ↵Simon Pilgrim2019-06-151-1/+2
| | | | | | rootsize. NFCI. llvm-svn: 363501
* [X86][AVX] Handle lane-crossing ↵Simon Pilgrim2019-06-151-49/+79
| | | | | | | | | | shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500
* [X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)Simon Pilgrim2019-06-151-0/+23
| | | | | | This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499
* [Clang] Harmonize Split DWARF options with llcAaron Puchert2019-06-151-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: With Split DWARF the resulting object file (then called skeleton CU) contains the file name of another ("DWO") file with the debug info. This can be a problem for remote compilation, as it will contain the name of the file on the compilation server, not on the client. To use Split DWARF with remote compilation, one needs to either * make sure only relative paths are used, and mirror the build directory structure of the client on the server, * inject the desired file name on the client directly. Since llc already supports the latter solution, we're just copying that over. We allow setting the actual output filename separately from the value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU. Fixes PR40276. Reviewers: dblaikie, echristo, tejohnson Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D59673 llvm-svn: 363496
* [PowerPC] Set the innermost hot loop to align 32 bytesKang Zhang2019-06-151-0/+12
| | | | | | | | | | | | | | | | | Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495
* [clang] Add storage for APValue in ConstantExprGauthier Harnisch2019-06-151-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When using ConstantExpr we often need the result of the expression to be kept in the AST. Currently this is done on a by the node that needs the result and has been done multiple times for enumerator, for constexpr variables... . This patch adds to ConstantExpr the ability to store the result of evaluating the expression. no functional changes expected. Changes: - Add trailling object to ConstantExpr that can hold an APValue or an uint64_t. the uint64_t is here because most ConstantExpr yield integral values so there is an optimized layout for integral values. - Add basic* serialization support for the trailing result. - Move conversion functions from an enum to a fltSemantics from clang::FloatingLiteral to llvm::APFloatBase. this change is to make it usable for serializing APValues. - Add basic* Import support for the trailing result. - ConstantExpr created in CheckConvertedConstantExpression now stores the result in the ConstantExpr Node. - Adapt AST dump to print the result when present. basic* : None, Indeterminate, Int, Float, FixedPoint, ComplexInt, ComplexFloat, the result is not yet used anywhere but for -ast-dump. Reviewers: rsmith, martong, shafik Reviewed By: rsmith Subscribers: rnkovacs, hiraditya, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62399 llvm-svn: 363493
* [BranchProbability] Delete a redundant overflow checkFangrui Song2019-06-151-4/+0
| | | | llvm-svn: 363492
* [SCEV] Use unsigned/signed intersection type in SCEVNikita Popov2019-06-151-19/+32
| | | | | | | | | | | | Based on D59959, this switches SCEV to use unsigned/signed range intersection based on the sign hint. This will prefer non-wrapping ranges in the relevant domain. I've left the one intersection in getRangeForAffineAR() to use the smallest intersection heuristic, as there doesn't seem to be any obvious preference there. Differential Revision: https://reviews.llvm.org/D60035 llvm-svn: 363490
* [SimplifyIndVar] Simplify non-overflowing saturating add/subNikita Popov2019-06-151-0/+24
| | | | | | | | | | | If we can detect that saturating math that depends on an IV cannot overflow, replace it with simple math. This is similar to the CVP optimization from D62703, just based on a different underlying analysis (SCEV vs LVI) that catches different cases. Differential Revision: https://reviews.llvm.org/D62792 llvm-svn: 363489
* [RISCV] Simplify RISCVAsmBackend::writeNopData(). NFCFangrui Song2019-06-151-7/+3
| | | | llvm-svn: 363486
* adding more fmf propagation for selects plus updated testsMichael Berg2019-06-152-20/+37
| | | | llvm-svn: 363484
* Revert "adding more fmf propagation for selects plus tests"Fangrui Song2019-06-152-37/+20
| | | | | | | | | | | This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482
* Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Matt Arsenault2019-06-152-7/+97
| | | | | | | This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478
* Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Mitch Phillips2019-06-142-96/+7
| | | | | | | | | | | This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929. This reverts rL363410. llvm-svn: 363476
* adding more fmf propagation for selects plus testsMichael Berg2019-06-142-20/+37
| | | | llvm-svn: 363474
* [MBP] Move a latch block with conditional exit and multi predecessors to top ↵Guozhi Wei2019-06-141-50/+233
| | | | | | | | | | | | | | | | of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471
* [ObjC][ARC] Delete ObjC runtime calls on global variables annotatedAkira Hatanaka2019-06-142-0/+48
| | | | | | | | | | | | with 'objc_arc_inert' Those calls are no-ops, so they can be safely deleted. rdar://problem/49839633 Differential Revision: https://reviews.llvm.org/D62433 llvm-svn: 363468
* AMDGPU: Avoid most waitcnts before callsMatt Arsenault2019-06-141-66/+90
| | | | | | | | | | | Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465
* Add --print-supported-cpus flag for clang.Ziang Wan2019-06-141-1/+34
| | | | | | | | | | | | This patch allows clang users to print out a list of supported CPU models using clang [--target=<target triple>] --print-supported-cpus Then, users can select the CPU model to compile to using clang --target=<triple> -mcpu=<model> a.c It is a handy feature to help cross compilation. llvm-svn: 363464
* SROA: Allow eliminating addrspacecasted allocasMatt Arsenault2019-06-142-9/+50
| | | | | | | | | | | | | | | | | | | There is a circular dependency between SROA and InferAddressSpaces today that requires running both multiple times in order to be able to eliminate all simple allocas and addrspacecasts. InferAddressSpaces can't remove addrspacecasts when written to memory, and SROA helps move pointers out of memory. This should avoid inserting new commuting addrspacecasts with GEPs, since there are unresolved questions about pointer wrapping between different address spaces. For now, don't replace volatile operations that don't match the alloca addrspace, as it would change the address space of the access. It may be still OK to insert an addrspacecast from the new alloca, but be more conservative for now. llvm-svn: 363462
* [PowerPC][NFC] Comments update and remove some unused defJinsong Ji2019-06-142-20/+2
| | | | llvm-svn: 363461
* AMDGPU: Fix dropping memref for ds append/consumeMatt Arsenault2019-06-141-1/+3
| | | | | | | | | The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455
* AMDGPU: Set isTrap on S_TRAPMatt Arsenault2019-06-141-1/+4
| | | | | | | This seems to only be used for generating some kind of documentation, but might as well set it. llvm-svn: 363454
* [JITLink] Move JITLinkMemoryManager into its own header.Lang Hames2019-06-143-90/+106
| | | | llvm-svn: 363444
* [Remarks] Use the RemarkSetup error in setupOptimizationRemarksFrancis Visoiu Mistrih2019-06-141-2/+2
| | | | | | | Added the errors in r363415 but they were not used in the RemarkStreamer. llvm-svn: 363439
* [GlobalISel] Add a G_BRJT opcode.Amara Emerson2019-06-142-0/+23
| | | | | | | | | | | | | This is a branch opcode that takes a jump table pointer, jump table index and an index into the table to do an indirect branch. We pass both the table pointer and JTI to allow targets like ARM64 to more easily use the existing jump table compression optimization without having to walk up the block to find a paired G_JUMP_TABLE. Differential Revision: https://reviews.llvm.org/D63159 llvm-svn: 363434
* Revert Fix a bug w/inbounds invalidation in LFTRFlorian Hahn2019-06-141-83/+11
| | | | | | | | | Reverting because it breaks a green dragon build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208 This reverts r363289 (git commit eb88badff96dacef8fce3f003dec34c2ef6900bf) llvm-svn: 363427
OpenPOWER on IntegriCloud