summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Recommit [MachineCombiner] Update instruction depths incrementally for large ↵Florian Hahn2017-09-202-23/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element. Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313751
* [X86][SSE] Remove unnecessary NonceMasks from combineX86ShufflesRecursively ↵Simon Pilgrim2017-09-201-25/+14
| | | | | | calls (NFCI) llvm-svn: 313743
* [SLP] Vectorize jumbled memory loads.Mohammad Shahid2017-09-202-83/+252
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 Commit after rebase for patch D36130 Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab llvm-svn: 313736
* 'into' instruction should not be decoded as a valid instr in 64-bit modeAndrew V. Tischenko2017-09-201-1/+1
| | | | llvm-svn: 313735
* [X86] Remove isel checks for immediate size on floating point compare and ↵Craig Topper2017-09-203-36/+26
| | | | | | | | xop compare instructions. NFCI If these checks fail we end up not selecting an instruction at all. So we are already relying on the immediate being checked upstream of isel. So doing the check in isel is just bloat to the isel table. Interestingly, we didn't check on the AVX512 version of the instructions anyway. llvm-svn: 313724
* [AMDGPU] Fixed memory leak with inliner replacedStanislav Mekhanoshin2017-09-201-1/+3
| | | | | | Delete inliner before replacing it. llvm-svn: 313723
* AMDGPU: Move r600 only code into r600 only td fileMatt Arsenault2017-09-202-53/+54
| | | | llvm-svn: 313719
* [AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.llStanislav Mekhanoshin2017-09-201-1/+2
| | | | llvm-svn: 313718
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-205-50/+161
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-205-1/+218
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* AMDGPU: Cleanup load/store PatFragsMatt Arsenault2017-09-207-271/+244
| | | | | | Try to use a consistent naming scheme. llvm-svn: 313713
* AMDGPU: Match store d16_hi instructionsMatt Arsenault2017-09-204-18/+77
| | | | llvm-svn: 313712
* Tighten the invariants around LoopBase::invalidateSanjoy Das2017-09-204-30/+30
| | | | | | | | | | | | | | | | | Summary: With this change: - Methods in LoopBase trip an assert if the receiver has been invalidated - LoopBase::clear frees up the memory held the LoopBase instance This change also shuffles things around as necessary to work with this stricter invariant. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38055 llvm-svn: 313708
* Reverting due to Green Dragon bot failure.Mike Edwards2017-09-203-26/+3
| | | | | | http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42594/ llvm-svn: 313706
* Clang-format few files to make later diffs leaner; NFCSanjoy Das2017-09-201-36/+33
| | | | llvm-svn: 313705
* GVNSink: Make ModelledPHIs constructor linear (and avoid edge case it ↵Daniel Berlin2017-09-201-7/+8
| | | | | | worries about) by avoiding getIncomingValueForBlock llvm-svn: 313702
* Revert "[GVNSink] Remove dependency on SmallPtrSet iteration order."Daniel Berlin2017-09-201-2/+0
| | | | | | This reverts commit r312156, because now the op and block arrays are not in the same order :(. llvm-svn: 313701
* NewGVN: Remove unused includesDaniel Berlin2017-09-201-21/+0
| | | | llvm-svn: 313700
* [MIRPrinter] Print empty successor lists when they cannot be guessedQuentin Colombet2017-09-191-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies commit r313685, this time with the proper updates to the test cases. Original commit message: Unreachable blocks in the machine instr representation are these weird empty blocks with no successors. The MIR printer used to not print empty lists of successors. However, the MIR parser now treats non-printed list of successors as "please guess it for me". As a result, the parser tries to guess the list of successors and given the block is empty, just assumes it falls through the next block (if any). For instance, the following test case used to fail the verifier. The MIR printer would print entry / \ true (def) false (no list of successors) | split.true (use) The MIR parser would understand this: entry / \ true (def) false | / <-- invalid edge split.true (use) Because of the invalid edge, we get the "def does not dominate all uses" error. The fix consists in printing empty successor lists, so that the parser knows what to do for unreachable blocks. rdar://problem/34022159 llvm-svn: 313696
* [LoopInfo] Make LoopBase and Loop destructors non-publicSanjoy Das2017-09-194-6/+6
| | | | | | | | | | | | | | | | | | | Summary: See comment for why I think this is a good idea. This change also: - Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop. - Updates the loop pass manager test case to deal with a private ~Loop::Loop. - Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37996 llvm-svn: 313695
* [WebAssembly] Add support for naming wasm data segmentsSam Clegg2017-09-193-3/+26
| | | | | | | | | Add adds support for naming data segments. This is useful useful linkers so that they can merge similar sections. Differential Revision: https://reviews.llvm.org/D37886 llvm-svn: 313692
* Allow ORE.emit to take a closure to delay building the remark objectAdam Nemet2017-09-195-28/+40
| | | | | | | | | | | In the lambda we are now returning the remark by value so we need to preserve its type in the insertion operator. This requires making the insertion operator generic. I've also converted a few cases to use the new API. It seems to work pretty well. See the LoopUnroller for a slightly more interesting case. llvm-svn: 313691
* CodeGen: use range based for loops (NFC)Saleem Abdulrasool2017-09-191-6/+1
| | | | | | | Simplify the RPOT traversal by using a range based for loop for the iterator dereference. llvm-svn: 313687
* Revert "[MIRPrinter] Print empty successor lists when they cannot be guessed"Quentin Colombet2017-09-191-1/+2
| | | | | | | | | This reverts commit r313685. I thought I had ran ninja check, but apparently I didn't... Need to update a bunch of mir tests. llvm-svn: 313686
* [MIRPrinter] Print empty successor lists when they cannot be guessedQuentin Colombet2017-09-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unreachable blocks in the machine instr representation are these weird empty blocks with no successors. The MIR printer used to not print empty lists of successors. However, the MIR parser now treats non-printed list of successors as "please guess it for me". As a result, the parser tries to guess the list of successors and given the block is empty, just assumes it falls through the next block (if any). For instance, the following test case used to fail the verifier. The MIR printer would print entry / \ true (def) false (no list of successors) | split.true (use) The MIR parser would understand this: entry / \ true (def) false | / <-- invalid edge split.true (use) Because of the invalid edge, we get the "def does not dominate all uses" error. The fix consists in printing empty successor lists, so that the parser knows what to do for unreachable blocks. rdar://problem/34022159 llvm-svn: 313685
* [ARM] Relax 'cpsie'/'cpsid' flag parsing.Jonathan Roelofs2017-09-191-1/+1
| | | | | | | | | | The ARM docs suggest in examples that the flags can have either case, and there are applications in the wild that (libopencm3, for example) that expect to be able to use the uppercase spelling. https://reviews.llvm.org/D37953 llvm-svn: 313680
* Revert "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"Reid Kleckner2017-09-191-54/+35
| | | | | | | | | This reverts r313640, originally r313400, one more time for essentially the same issue. My BitVector of spilled location numbers isn't working because we coalesce identical DBG_VALUE locations as we rewrite them, invalidating the location numbers used to index the BitVector. llvm-svn: 313679
* Import all inlined indirect call targets for SamplePGO.Dehao Chen2017-09-191-3/+5
| | | | | | | | | | | | | | Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36637 llvm-svn: 313678
* [MSP430] Align functions on 2-byte boundary instead of 4.Vadzim Dambrouski2017-09-191-1/+1
| | | | | | | | | | | | | | | | Summary: There is no benefit in having the 4-byte alignment, and removing this restriction can save a lot of space for some applications. Reviewers: asl, awygle Reviewed By: awygle Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36165 llvm-svn: 313676
* [SimplifyCFG] fix typos/formatting; NFCSanjay Patel2017-09-191-24/+22
| | | | llvm-svn: 313671
* [AMDGPU] Prevent post-RA scheduler from breaking memory clausesStanislav Mekhanoshin2017-09-192-0/+58
| | | | | | | | | The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670
* [SystemZ] Fix truncstore + bswap codegen bugUlrich Weigand2017-09-191-1/+2
| | | | | | | | | | | | | SystemZTargetLowering::combineSTORE contains code to transform a combination of STORE + BSWAP into a STRV type instruction. This transformation is correct for regular stores, but not for truncating stores. The routine neglected to check for that case. Fixes a miscompilation of llvm-objcopy with clang, which caused test suite failures in the SystemZ multistage build bot. llvm-svn: 313669
* Revert "ExecutionEngine: add R_AARCH64_ABS{16,32}"Saleem Abdulrasool2017-09-191-12/+0
| | | | | | | | This reverts commit SVN r313654. Seems that it is triggering an assertion on Windows specifically. Revert until I can build on Windows and look into what is happening there. llvm-svn: 313668
* dwarfdump/symbolizer: Avoid loading unneeded CUs from a DWPDavid Blaikie2017-09-193-9/+11
| | | | | | | | When symbolizing large binaries, parsing every CU in a DWP file is a significant performance penalty. Instead, use the index to only load the CUs that are needed. llvm-svn: 313659
* Handle profile mismatch correctly for SamplePGO.Dehao Chen2017-09-191-1/+6
| | | | | | | | | | | | | | Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash. Reviewers: davidxl, tejohnson, eraman Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38018 llvm-svn: 313658
* Re-land "Fix Bug 30978 by emitting cv file checksums."Reid Kleckner2017-09-197-49/+179
| | | | | | | This reverts r313431 and brings back r313374 with a fix to write checksums as binary data and not ASCII hex strings. llvm-svn: 313657
* ExecutionEngine: add R_AARCH64_ABS{16,32}Saleem Abdulrasool2017-09-191-0/+12
| | | | | | | | | | Add support for the R_AARCH64_ABS{16,32} relocations in the execution engine. This is primarily used for DWARF debug information relocations and needed by the LLVM JIT to support JITing for lldb. Patch by Alex Langford! llvm-svn: 313654
* [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction ↵Craig Topper2017-09-193-22/+3
| | | | | | | | | | selection to avoid duplicate patterns Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns. Differential Revision: https://reviews.llvm.org/D38022 llvm-svn: 313644
* Re-land r313400 "[DebugInfo] Insert DW_OP_deref when spilling indirect ↵Reid Kleckner2017-09-191-35/+54
| | | | | | | | | | | | | | DBG_VALUEs" I forgot to zero out the BitVector when reusing it between UserValues. Later uses of the same location number for a different UserValue would falsely indicate that they were spilled. Usually this would lead to incorrect debug info, but in some cases they would indicate something nonsensical like a memory location based on a vector register (Q8 on ARM). llvm-svn: 313640
* [PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD.Tony Jiang2017-09-191-0/+116
| | | | | | | | | | Two blocks prior to the join each perform an li and the the join block has an add using the initialized register. Optimize each predecessor block to instead use addi and delete the li's and add. Differential Revision: https://reviews.llvm.org/D36734 llvm-svn: 313639
* [Power9] Add missing Power9 instructions.Tony Jiang2017-09-195-442/+67
| | | | | | | The following 8 instructions are implemented in this patch. addpcis(subpcis, lnia), darn, maddhd, maddhdu, maddld, setb llvm-svn: 313636
* dwarfdump: Delay parsing abbreviations until they're neededDavid Blaikie2017-09-192-10/+33
| | | | | | | | | | | This speeds up dumping specific DIEs by not parsing abbreviations for units that are not used. (this is also handy to have in eventually to speed up llvm-symbolizer for .dwp files, where parsing most of the DWP file can be avoided by using the index) llvm-svn: 313635
* [globalisel] Add a G_BSWAP instruction and support bswap using it.Daniel Sanders2017-09-191-0/+3
| | | | llvm-svn: 313633
* [Nios2] Subtarget, basic infrastructure for frame, instructions and registersNikolai Bozhenov2017-09-1915-20/+545
| | | | | | | | | | | | | | This is the second minimal patch keeping Nios2 target buildable. I'm adding subtarget here and other stuff for frame lowering, instruction, register information methods. I do not add any test cases, as still there are missing parts like DAG selector and assembly printing. I plan to include them into the next patch. Patch by Andrei Grischenko <andrei.l.grischenko@intel.com> Differential Revision: https://reviews.llvm.org/D37256 llvm-svn: 313626
* [x86] Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-192-24/+7
| | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625
* [ARM] Use ADDCARRY / SUBCARRYRoger Ferrer Ibanez2017-09-192-20/+169
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparatory step for D34515. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 - fixes PR34564 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313618
* AMDGPU: Run internalize symbols at -O0Matt Arsenault2017-09-191-21/+21
| | | | | | | | The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616
* [X86][Skylake] Adding the scheduling information for the SkylakeClient targetGadi Haber2017-09-193-3/+4014
| | | | | | | | | | | | | | This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879. Please expect some performance fluctuations due to code alignment effects. Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena Differential Revision: https://reviews.llvm.org/D37294 llvm-svn: 313613
* [X86] Remove some unnecessary patterns for truncate with X86ISD::SELECT and ↵Craig Topper2017-09-191-6/+0
| | | | | | | | undef preserved source. We canonicalize undef preserved sources to zero during intrinsic lowering. llvm-svn: 313612
* [X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing ↵Craig Topper2017-09-191-0/+16
| | | | | | table. llvm-svn: 313610
OpenPOWER on IntegriCloud