summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove the unused Offset field from MachineLocation (NFC)Adrian Prantl2017-08-022-11/+3
| | | | | | rdar://problem/33580047 llvm-svn: 309831
* [DAG] Improve candidate pruning in store merge failure case. NFCINirav Dave2017-08-021-20/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During store merge we construct a sorted list of consecutive store candidates and consider subsequences for merging into a single store. For each subsequence we check if the stored value type is legal the merged store would have valid and fast and if the constructed value to be stored is valid. The only properties that affect this check between subsequences is the size of the subsequence, the alignment of the first store, the alignment of the stored load value (when merging stores-of-loads), and whether the merged value is a constant zero. If we do not find a viable mergeable subsequence starting from the first store of length N, we know that a subsequence starting at a later store of length N will also fail unless the new store's alignment, the new load's alignment (if we're merging store-of-loads), or we've dropped stores of nonzero value and could construct a merged stores of zero (for merging constants). As a result if we fail to find a valid subsequence starting from the first store we can safely skip considering subsequences that start with subsequent stores unless one of the above properties is true. This significantly (2x) improves compile time in some pathological cases. Reviewers: RKSimon, efriedma, zvi, spatel, waltl Subscribers: grandinj, llvm-commits Differential Revision: https://reviews.llvm.org/D35901 llvm-svn: 309830
* Remove unused includes of MachineLocation.h (NFC)Adrian Prantl2017-08-023-2/+1
| | | | llvm-svn: 309824
* Remove unreachable code. (NFC)Adrian Prantl2017-08-024-26/+5
| | | | | | | | MachineLocation::getOffset() always returns 0. rdar://problem/33580047 llvm-svn: 309823
* [AArch64] Simplify AES*Tied pseudo expansion (NFC).Florian Hahn2017-08-021-10/+3
| | | | | | | | | | | | | | | | Summary: Suggested by @t.p.northover in https://bugs.llvm.org/show_bug.cgi?id=34015. Reviewers: javed.absar, t.p.northover, rengolin Reviewed By: t.p.northover Subscribers: aemerson, kristof.beyls, llvm-commits, t.p.northover Differential Revision: https://reviews.llvm.org/D36223 llvm-svn: 309821
* [InlineCost] Remove redundant call. NFC.Chad Rosier2017-08-021-2/+3
| | | | llvm-svn: 309819
* [InlineCost] Simplify more 'and' and 'or' operations.Chad Rosier2017-08-021-0/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D35856 llvm-svn: 309817
* [SLPVectorizer] Generalize interface of functions, NFC.Alexey Bataev2017-08-021-11/+15
| | | | llvm-svn: 309816
* [SLP] Fix for PR31880: shuffle and vectorize repeated scalar ops on ↵Alexey Bataev2017-08-021-0/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | extracted elements Summary: Currently most of the time vectors of extractelement instructions are treated as scalars that must be gathered into vectors. But in some cases, like when we have extractelement instructions from single vector with different constant indeces or from 2 vectors of the same size, we can treat this operations as shuffle of a single vector or blending of 2 vectors. ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %x0 = extractelement <2 x i8> %x, i32 0 %y1 = extractelement <2 x i8> %y, i32 1 %x0x0 = mul i8 %x0, %x0 %y1y1 = mul i8 %y1, %y1 %ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0 %ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1 ret <2 x i8> %ins2 } ``` can be converted to something like ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3> %2 = mul <2 x i8> %1, %1 ret <2 x i8> %2 } ``` Currently this type of conversion is considered as high cost transformation. Reviewers: mzolotukhin, delena, mkuper, hfinkel, RKSimon Subscribers: ashahid, RKSimon, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D30200 llvm-svn: 309812
* [MIR] Print target-specific constant poolsDiana Picus2017-08-022-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | This should enable us to test the generation of target-specific constant pools, e.g. for ARM: constants: - id: 0 value: 'g(GOT_PREL)-(LPC0+8-.)' alignment: 4 isTargetSpecific: true I intend to use this to test PIC support in GlobalISel for ARM. This is difficult to test outside of that context, since the existing MIR tests usually rely on parser support as well, and that seems a bit trickier to add. We could try to add a unit test, but the setup for that seems rather convoluted and overkill. We do test however that the parser reports a nice error when encountering a target-specific constant pool. Differential Revision: https://reviews.llvm.org/D36092 llvm-svn: 309806
* [NewGVN] Fold single-use variables. NFCI.Davide Italiano2017-08-021-5/+3
| | | | llvm-svn: 309790
* [NewGVN] Remove a (now stale) comment. NFCI.Davide Italiano2017-08-021-1/+0
| | | | llvm-svn: 309789
* Fix the bug that parseAAPipeline is not invoked in runNewPMPasses in release ↵Dehao Chen2017-08-021-8/+9
| | | | | | | | | | | | | | | | compiler. Summary: The logic is guarded by "assert". Reviewers: davidxl, davide, chandlerc Reviewed By: davide, chandlerc Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36195 llvm-svn: 309787
* [SimplifyCFG] Fix typo in comment. NFCCraig Topper2017-08-021-1/+1
| | | | llvm-svn: 309785
* [PM] Fix a bug where through CGSCC iteration we can getChandler Carruth2017-08-021-2/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | infinite-inlining across multiple runs of the inliner by keeping a tiny history of internal-to-SCC inlining decisions. This is still a bit gross, but I don't yet have any fundamentally better ideas and numerous people are blocked on this to use new PM and ThinLTO together. The core of the idea is to detect when we are about to do an inline that has a chance of re-splitting an SCC which we have split before with a similar inlining step. That is a critical component in the inlining forming a cycle and so far detects all of the various cyclic patterns I can come up with as well as the original real-world test case (which comes from a ThinLTO build of libunwind). I've added some tests that I think really demonstrate what is going on here. They are essentially state machines that march the inliner through various steps of a cycle and check that we stop when the cycle is closed and that we actually did do inlining to form that cycle. A lot of thanks go to Eric Christopher and Sanjoy Das for the help understanding this issue and improving the test cases. The biggest "yuck" here is the layering issue -- the CGSCC pass manager is providing somewhat magical state to the inliner for it to use to make itself converge. This isn't great, but I don't honestly have a lot of better ideas yet and at least seems nicely isolated. I have tested this patch, and it doesn't block *any* inlining on the entire LLVM test suite and SPEC, so it seems sufficiently narrowly targeted to the issue at hand. We have come up with hypothetical issues that this patch doesn't cover, but so far none of them are practical and we don't have a viable solution yet that covers the hypothetical stuff, so proceeding here in the interim. Definitely an area that we will be back and revisiting in the future. Differential Revision: https://reviews.llvm.org/D36188 llvm-svn: 309784
* AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to itMatt Arsenault2017-08-023-6/+60
| | | | llvm-svn: 309783
* AMDGPU: Fix emitting encoded callsMatt Arsenault2017-08-022-4/+11
| | | | | | | | | | This was failing on out of bounds access to the extra operands on the s_swappc_b64 beyond those in the instruction definition. This was working, but somehow regressed within the past few weeks, although I don't see any obvious commit. llvm-svn: 309782
* AMDGPU: Analyze callee resource usage in AsmPrinterMatt Arsenault2017-08-025-19/+186
| | | | llvm-svn: 309781
* Update the new PM pipeline to make ICP aware if it is SamplePGO build.Dehao Chen2017-08-022-25/+33
| | | | | | | | | | | | | | Summary: In ThinLTO backend compile, OPTOptions are not set so that the ICP in ThinLTO backend does not know if it is a SamplePGO build, in which profile count needs to be annotated directly on call instructions. This patch cleaned up the PGOOptions handling logic and passes down PGOOptions to ThinLTO backend. Reviewers: chandlerc, tejohnson, davidxl Reviewed By: chandlerc Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36052 llvm-svn: 309780
* [AMDGPU] Fix asan error after last commitStanislav Mekhanoshin2017-08-021-1/+1
| | | | | | | | | | Previous change "Turn s_and_saveexec_b64 into s_and_b64 if result is unused" introduced asan use-after-poison error. Instruction was analyzed after eraseFromParent() calls. Move analysys higher than erase. llvm-svn: 309779
* [DAG] Refactor store merge subexpressions. NFC.Nirav Dave2017-08-021-23/+28
| | | | | | Distribute various expressions across ifs. llvm-svn: 309777
* AMDGPU: Don't place arguments in emergency stack slotMatt Arsenault2017-08-021-1/+9
| | | | | | | | When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776
* DAG: Undo and->or combine with FrameIndexesMatt Arsenault2017-08-021-0/+9
| | | | | | | | | | | | | | This pattern shows up when lowering byval copies on AMDGPU. The byval object access is split into 4-byte chunks, adding a constant offset to the FixedStack base. When some of the offsets turn into ors, this prevents combining the constant offsets. This makes it not apparent that the object is there when matching addressing modes, so it ends up using a scratch wave offset relative access and the lengthy frame index expansion for that. llvm-svn: 309775
* Update LiveDebugValues to generate DIExpressions for spill offsetsAdrian Prantl2017-08-021-2/+7
| | | | | | | | | | | instead of using the deprecated offset field of DBG_VALUE. This has no observable effect on the generated DWARF, but the assembler comments will look different. rdar://problem/33580047 llvm-svn: 309773
* [AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unusedStanislav Mekhanoshin2017-08-012-1/+65
| | | | | | | | | | With SI_END_CF elimination for some nested control flow we can now eliminate saved exec register completely by turning a saveexec version of instruction into just a logical instruction. Differential Revision: https://reviews.llvm.org/D36007 llvm-svn: 309766
* [AMDGPU] Collapse adjacent SI_END_CFStanislav Mekhanoshin2017-08-014-0/+168
| | | | | | | | | | | | | | | | Add a pass to remove redundant S_OR_B64 instructions enabling lanes in the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any vector instructions between them we can only keep outer SI_END_CF, given that CFG is structured and exec bits of the outer end statement are always not less than exec bit of the inner one. This needs to be done before the RA to eliminate saved exec bits registers but after register coalescer to have no vector registers copies in between of different end cf statements. Differential Revision: https://reviews.llvm.org/D35967 llvm-svn: 309762
* [SCEV/IndVars] Always compute loop exiting values if the backedge count is 0Sanjoy Das2017-08-011-0/+19
| | | | | | | | | If SCEV can prove that the backedge taken count for a loop is zero, it does not need to "understand" a recursive PHI to compute its exiting value. This should fix PR33885. llvm-svn: 309758
* Use helper function instead of manually constructing DBG_VALUEs (NFC)Adrian Prantl2017-08-012-17/+9
| | | | | | rdar://problem/33580047 llvm-svn: 309757
* Remove PrologEpilogInserter's usage of DBG_VALUE's offset fieldAdrian Prantl2017-08-014-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the last half-dozen commits to LLVM I removed code that became dead after removing the offset parameter from llvm.dbg.value gradually proceeding from IR towards the backend. Before I can move on to DwarfDebug and friends there is one last side-called offset I need to remove: This patch modifies PrologEpilogInserter's use of the DBG_VALUE's offset argument to use a DIExpression instead. Because the PrologEpilogInserter runs at the Machine level I had to play a little trick with a named llvm.dbg.mir node to get the DIExpressions to print in MIR dumps (which print the llvm::Module followed by the MachineFunction dump). I also had to add rudimentary DwarfExpression support to CodeView and as a side-effect also fixed a bug (CodeViewDebug::collectVariableInfo was supposed to give up on variables with complex DIExpressions, but would fail to do so for fragments, which are also modeled as DIExpressions). With this last holdover removed we will have only one canonical way of representing offsets to debug locations which will simplify the code in DwarfDebug (and future versions of CodeViewDebug once it starts handling more complex expressions) and make it easier to reason about. This patch is NFC-ish: All test case changes are for assembler comments and the binary output does not change. rdar://problem/33580047 Differential Revision: https://reviews.llvm.org/D36125 llvm-svn: 309751
* [AArch64] Fix a typo in isExtFreeImpl()Haicheng Wu2017-08-011-1/+1
| | | | | | | | next => not Differential Revision: https://reviews.llvm.org/D36104 llvm-svn: 309748
* [llvm-cov] Allow specifying distinct architectures for each loaded binaryVedant Kumar2017-08-011-3/+4
| | | | | | | | | | | The coverage tool needs to know which slice to look at when it's handed a universal binary. Some projects need to look at aggregate coverage reports for a variety of slices in different binaries: this patch adds support for these kinds of projects to llvm-cov. rdar://problem/33579007 llvm-svn: 309747
* [Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-0125-398/+562
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 309746
* [AArch64] Rewrite stack frame handling for win64 vararg functionsMartin Storsjo2017-08-011-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous attempt, which made do with a single offset in computeCalleeSaveRegisterPairs, wasn't quite enough. The previous attempt only worked as long as CombineSPBump == true (since the offset would be adjusted later in fixupCalleeSaveRestoreStackOffset). Instead include the size for the fixed stack area used for win64 varargs in calculations in emitPrologue/emitEpilogue. The stack consists of mainly three parts; - AFI->getLocalStackSize() - AFI->getCalleeSavedStackSize() - FixedObject Most of the places in the code which previously used the CSStackSize now use PrologueSaveSize instead, which is the sum of the latter two, while some cases which need exactly the middle one use AFI->getCalleeSavedStackSize() explicitly instead of a local variable. In addition to moving the offsetting into emitPrologue/emitEpilogue (which fixes functions with CombineSPBump == false), also set the frame pointer to point to the right location, where the frame pointer and link register actually are stored. In addition to the prologue/epilogue, this also requires changes to resolveFrameIndexReference. Add tests for a function that keeps a frame pointer and another one that uses a VLA. Differential Revision: https://reviews.llvm.org/D35919 llvm-svn: 309744
* AMDGPU: Fix handling of div_scale with undef inputsMatt Arsenault2017-08-011-1/+55
| | | | | | | | | | | | The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. llvm-svn: 309743
* [DAG] Factor out common expressions. NFC.Nirav Dave2017-08-011-19/+21
| | | | llvm-svn: 309740
* [Value Tracking] Default argument to true and rename accordingly. NFC.Chad Rosier2017-08-014-19/+18
| | | | | | IMHO this is a bit more readable. llvm-svn: 309739
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-0118-14/+574
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* [DebugInfo] Don't turn dbg.declare into DBG_VALUE for static allocasReid Kleckner2017-08-011-0/+7
| | | | | | | | | | | | | | | | Summary: We already have information about static alloca stack locations in our side table. Emitting instructions for them is inefficient, and it only happens when the address of the alloca has been materialized within the current block, which isn't often. Reviewers: aprantl, probinson, dblaikie Subscribers: jfb, dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D36117 llvm-svn: 309729
* [Value Tracking] Refactor and/or logic into helper. NFC.Chad Rosier2017-08-011-40/+52
| | | | llvm-svn: 309726
* [AMDGPU] Put a function used only inside assert() under NDEBUG.Davide Italiano2017-08-011-0/+4
| | | | llvm-svn: 309723
* [lanai] Add getIntImmCost in LanaiTargetTransformInfo.Jacques Pienaar2017-08-011-0/+27
| | | | | | Add simple int immediate cost function. llvm-svn: 309721
* Pull out VectorNumElements value. NFC.Nirav Dave2017-08-011-13/+9
| | | | llvm-svn: 309719
* Revert "[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector."Nirav Dave2017-08-011-26/+11
| | | | | | | This reverts commit r309680 which appears to be raising an assertion in the test-suite. llvm-svn: 309717
* [libFuzzer] temporarty remove pc-tables and disable ↵Kostya Serebryany2017-08-012-2/+3
| | | | | | test/fuzzer-printcovpcs.test until this can be fixed on Windows llvm-svn: 309716
* [X86][SSE] Added missing vector logic intrinsic schedulesSimon Pilgrim2017-08-011-10/+6
| | | | | | | | Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them. llvm-svn: 309715
* [CGP] use narrower types in memcmp expansion when possibleSanjay Patel2017-08-011-1/+6
| | | | | | | | This only affects very small memcmp on x86 for now, but it will become more important if we allow vector-sized load and compares. llvm-svn: 309711
* [DAG] Convert extload check to equivalent type check. NFC.Nirav Dave2017-08-011-5/+10
| | | | | | Replace check with check that consuming store has the same type. llvm-svn: 309708
* [X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large maskCraig Topper2017-08-011-5/+36
| | | | | | | | | | | | | | Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36129 llvm-svn: 309706
* [X86][SSE] Added missing PACKSS/PACKUS intrinsic schedulesSimon Pilgrim2017-08-013-8/+10
| | | | | | | | Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class llvm-svn: 309701
* [X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedulesSimon Pilgrim2017-08-011-2/+2
| | | | llvm-svn: 309699
OpenPOWER on IntegriCloud