summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-284-371/+380
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476
* [globalisel] Change LLT constructor string into an LLT subclass that knows ↵Daniel Sanders2017-02-287-61/+77
| | | | | | | | | | | | | | | | | | how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 296474
* [ARM] GlobalISel: Lower i32 and fp call parameters on the stackDiana Picus2017-02-281-7/+31
| | | | | | | | | | | | Lower i32, float and double parameters that need to live on the stack. This boils down to creating some G_GEPs starting from the stack pointer and storing the values there. During the process we also keep track of the stack size and use the final value in the ADJCALLSTACKDOWN/UP instructions. We currently assert for smaller types, since they usually require extensions. They will be handled in a separate patch. llvm-svn: 296473
* [ARM] GlobalISel: Select 32-bit G_CONSTANTDiana Picus2017-02-281-0/+11
| | | | | | Put it into a register by means of a MOVi. llvm-svn: 296471
* [ARM] GlobalISel: Add mapping for G_CONSTANTDiana Picus2017-02-281-0/+1
| | | | | | | Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register operand. llvm-svn: 296469
* [ARM] GlobalISel: Legalize 32-bit constantsDiana Picus2017-02-281-0/+2
| | | | llvm-svn: 296468
* Reformat a blank line.NAKAMURA Takumi2017-02-281-1/+1
| | | | llvm-svn: 296464
* Revert r296442 (and r296443), "Allow externally dlopen-ed libraries to be ↵NAKAMURA Takumi2017-02-282-51/+14
| | | | | | | | registered as permanent libraries." It broke clang/test/Analysis/checker-plugins.c llvm-svn: 296463
* [ARM] GlobalISel: Select G_GEPDiana Picus2017-02-281-0/+1
| | | | | | At this point, G_GEP is just an add, so we treat it exactly like a G_ADD. llvm-svn: 296462
* [ARM] Diagnose PC-writing instructions in IT blocksOliver Stannard2017-02-281-4/+7
| | | | | | | | | | | | In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in an IT block but not the last instruction in the block. Previously, we only diagnosed this for LDM instructions, this patch extends the diagnostic to cover all of the relevant instructions. Differential Revision: https://reviews.llvm.org/D30398 llvm-svn: 296459
* [ARM] GlobalISel: Add reg bank mapping for G_GEPDiana Picus2017-02-281-0/+1
| | | | | | This should be the same as the mapping for G_ADD etc. llvm-svn: 296455
* [ARM] GlobalISel: Legalize G_GEP with 32-bit offsetsDiana Picus2017-02-281-0/+3
| | | | | | | At the moment we're only interested in GEPs for putting call parameters on the stack, so we'll stick to 32-bit offsets. llvm-svn: 296452
* Test commit, fix typo, NFC.Vadzim Dambrouski2017-02-281-1/+1
| | | | llvm-svn: 296447
* Fix Win bots.Vassil Vassilev2017-02-281-2/+2
| | | | llvm-svn: 296443
* Allow externally dlopen-ed libraries to be registered as permanent libraries.Vassil Vassilev2017-02-282-13/+50
| | | | | | | | | | This is also useful in cases when llvm is in a shared library. First we dlopen the llvm shared library and then we register it as a permanent library in order to keep the JIT and other services working. Patch reviewed by Vedant Kumar (D29955)! llvm-svn: 296442
* [ImplicitNullCheck] Add alias analysis usageSanjoy Das2017-02-281-27/+75
| | | | | | | | | | | | | | | | | | | Summary: With this change ImplicitNullCheck optimization uses alias analysis and can use load/store memory access for implicit null check if there are other load/store before but memory accesses do not alias. Patch by Serguei Katkov! Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30331 llvm-svn: 296440
* Empty line. NFCIXin Tong2017-02-281-1/+0
| | | | llvm-svn: 296438
* [LoopUnswitch] Common pushing LIC's user to worklist.Xin Tong2017-02-281-4/+2
| | | | llvm-svn: 296432
* Revert "Add MIR-level outlining pass"Matthias Braun2017-02-286-1508/+0
| | | | | | | | Revert Machine Outliner for now, as it breaks the asan bot. This reverts commit r296418. llvm-svn: 296426
* Add MIR-level outlining passMatthias Braun2017-02-286-0/+1508
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This is useful to people working in low-memory environments, where sacrificing performance for space is acceptable. This adds an interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test. The outliner is run like so: clang -mno-red-zone -mllvm -enable-machine-outliner file.c Patch by Jessica Paquette<jpaquette@apple.com>! rdar://29166825 Differential Revision: https://reviews.llvm.org/D26872 llvm-svn: 296418
* [CGP] Split some critical edges coming out of indirect branchesMichael Kuperstein2017-02-281-0/+157
| | | | | | | | | | | | | | | | | | | | | | Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the *direct* branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416
* [PDB] Make streams carry their own endianness.Zachary Turner2017-02-2818-92/+80
| | | | | | | | | | | | | Before the endianness was specified on each call to read or write of the StreamReader / StreamWriter, but in practice it's extremely rare for streams to have data encoded in multiple different endiannesses, so we should optimize for the 99% use case. This makes the code cleaner and more general, but otherwise has NFC. llvm-svn: 296415
* [DebugInfo] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-02-279-46/+83
| | | | | | other minor fixes (NFC). llvm-svn: 296413
* [SLP] Load sorting should not try to sort things that aren't loads.Michael Kuperstein2017-02-271-0/+5
| | | | | | | We may get a VL where the first element is a load, but the others aren't. Trying to sort such VLs can only lead to sorrow. llvm-svn: 296411
* [MC] Implement the COFF directives in MCNullStreamer.Dan Gohman2017-02-271-0/+4
| | | | | | This fixes -filetype=null errors introduced in r296403. llvm-svn: 296410
* AMDGPU: Basic folds for fmed3 intrinsicMatt Arsenault2017-02-272-0/+84
| | | | | | | Constant fold, canonicalize constants to RHS, reduce to minnum/maxnum when inputs are nan/undef. llvm-svn: 296409
* Remove some code accidentally left in.Zachary Turner2017-02-271-2/+0
| | | | llvm-svn: 296407
* [AddressSanitizer] Put shadow at 0 for FuchsiaPetr Hosek2017-02-271-1/+6
| | | | | | | | | | The Fuchsia ASan runtime reserves the low part of the address space. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30426 llvm-svn: 296405
* [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-02-276-31/+97
| | | | | | other minor fixes (NFC). llvm-svn: 296404
* [MC] Factor out non-COFF handling of COFF-specific directives.Dan Gohman2017-02-274-52/+12
| | | | | | | | | Instead of requiring every non-COFF MCObjectStreamer to implement the COFF hooks just to do an llvm_unreachable to say that they're not supported, do the llvm_unreachable in the default implementation, as suggested by rnk in https://reviews.llvm.org/D26722. llvm-svn: 296403
* [WebAssembly] Add some comments and tidy up whitespace.Dan Gohman2017-02-274-5/+6
| | | | llvm-svn: 296402
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-277-33/+52
| | | | llvm-svn: 296401
* [WebAssembly] Split CFG-sorting into its own pass. NFC.Dan Gohman2017-02-277-223/+302
| | | | | | | CFG sorting was already an independent algorithm from block/loop insertion; this change makes it more convenient to debug. llvm-svn: 296399
* Revert r296366 "[InlineFunction] add nonnull assumptions based on argument ↵Hans Wennborg2017-02-271-36/+22
| | | | | | | | attributes" It causes miscompiles e.g. during self-host of Clang (PR32082). llvm-svn: 296398
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-2711-63/+378
| | | | llvm-svn: 296396
* ISel: We need to notify FastIS of the IMPLICIT_DEF we created in ↵Arnold Schwaighofer2017-02-271-1/+7
| | | | | | | | | | createSwiftErrorEntriesInEntryBlock Otherwise, it will insert instructions before it. rdar://30536186 llvm-svn: 296395
* [PDB] Partial resubmit of r296215, which improved PDB Stream Library.Zachary Turner2017-02-2730-205/+187
| | | | | | | | | | | | | | | | | This was reverted because it was breaking some builds, and because of incorrect error code usage. Since the CL was large and contained many different things, I'm resubmitting it in pieces. This portion is NFC, and consists of: 1) Renaming classes to follow a consistent naming convention. 2) Fixing the const-ness of the interface methods. 3) Adding detailed doxygen comments. 4) Fixing a few instances of passing `const BinaryStream& X`. These are now passed as `BinaryStreamRef X`. llvm-svn: 296394
* Revert "DAG: Check if extract_vector_elt is legal or custom"Matt Arsenault2017-02-271-1/+1
| | | | | | | This reverts r295782. This could potentially result in some legalization loops and I avoided the need for this. llvm-svn: 296393
* Empty line. NFCIXin Tong2017-02-271-1/+0
| | | | llvm-svn: 296392
* [PGO] Fix a bug in reading text format value profile.Rong Xu2017-02-271-2/+3
| | | | | | | | | | | | | | Summary: Should use the Valuekind read from the profile. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits, xur Differential Revision: https://reviews.llvm.org/D30420 llvm-svn: 296391
* [ARM] don't transform an add(ext Cond), C to select unless there's a setcc ↵Sanjay Patel2017-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | of the condition The transform in question claims to be doing: // fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c)) ...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node for the sext/zext patterns. This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(), so I was seeing infinite loops with my draft of a patch applied. The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's not affected by this code change. Differential Revision: https://reviews.llvm.org/D30355 llvm-svn: 296389
* AMDGPU: Add some of the new gfx9 VOP3 instructionsMatt Arsenault2017-02-271-0/+12
| | | | llvm-svn: 296382
* [X86][SSE] Attempt to extract vector elements through target shufflesSimon Pilgrim2017-02-272-0/+112
| | | | | | | | | | DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381
* AMDGPU: Support inlineasm for packed instructionsMatt Arsenault2017-02-271-1/+42
| | | | | | | Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379
* AMDGPU: Don't fold immediate if clamp/omod are setMatt Arsenault2017-02-272-8/+13
| | | | | | | Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375
* AMDGPU: Fold omod into instructionsMatt Arsenault2017-02-273-6/+146
| | | | llvm-svn: 296372
* [TailDuplicator] Maintain DebugLoc for branch instructionsTaewook Oh2017-02-271-1/+2
| | | | | | | | | | | | | | Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions. Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie Reviewed By: dblaikie Subscribers: dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D30026 llvm-svn: 296371
* AMDGPU: Add f16 to shader calling conventionsMatt Arsenault2017-02-271-3/+3
| | | | | | Mostly useful for writing tests for f16 features. llvm-svn: 296370
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-2723-86/+879
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* [InlineFunction] add nonnull assumptions based on argument attributesSanjay Patel2017-02-271-22/+36
| | | | | | | | | | | This was suggested in D27855: have the inliner add assumptions, so we don't lose nonnull info provided by argument attributes. This still doesn't solve PR28430 (dyn_cast), but this gets us closer. https://reviews.llvm.org/D29999 llvm-svn: 296366
OpenPOWER on IntegriCloud