summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* fix formatting; NFCSanjay Patel2016-09-061-19/+14
| | | | llvm-svn: 280727
* [MCTargetDesc] Delete dead code. Found by GCC7 -Wunused-function.Davide Italiano2016-09-061-17/+0
| | | | | | Also unbreak newer gcc build with -Werror. llvm-svn: 280726
* [RDF] Ignore undef use operandsKrzysztof Parzyszek2016-09-061-1/+1
| | | | llvm-svn: 280717
* Formatting with clang-format patch r280700Leny Kholodov2016-09-063-49/+47
| | | | llvm-svn: 280716
* [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ↵Simon Pilgrim2016-09-061-0/+6
| | | | | | | | | | | | ), Idx ) -> In If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector. This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes. Differential Revision: https://reviews.llvm.org/D24254 llvm-svn: 280715
* [JumpThreading] Only write back branch-weight MDs for blocks that originally ↵Adam Nemet2016-09-061-1/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | had PGO info Currently the pass updates branch weights in the IR if the function has any PGO info (entry frequency is set). However we could still have regions of the CFG that does not have branch weights collected (e.g. a cold region). In this case we'd use static estimates. Since static estimates for branches are determined independently, they are inconsistent. Updating them can "randomly" inflate block frequencies. I've run into this in a completely cold loop of h264ref from SPEC. -Rpass-with-hotness showed the loop to be completely cold during inlining (before JT) but completely hot during vectorization (after JT). The new testcase demonstrate the problem. We check array elements against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting check. The block names should be self-explanatory. In this example, jump threading incorrectly updates the weight of the loop-exiting branch to 0, drastically inflating the frequency of the loop (in the range of billions). There is no run-time profile info for edges inside the loop, so branch probabilities are estimated. These are the resulting branch and block frequencies for the loop body: check_1 (16) (8) / | eq_1 | (8) \ | check_2 (16) (8) / | eq_2 | (8) \ | check_3 (16) (1) / | (loop exit) | (15) | (back edge) First we thread eq_1 -> check_2 to check_3. Frequencies are updated to remove the frequency of eq_1 from check_2 and then from the false edge leaving check_2. Changed frequencies are highlighted with * *: check_1 (16) (8) / | eq_1~ | (8) / | / check_2 (*8*) / (8) / | \ eq_2 | (*0*) \ \ | ` --- check_3 (16) (1) / | (loop exit) | (15) | (back edge) Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new back edges. Frequencies are updated to remove the frequency of eq_1 and eq_3 from check_3 and then the false edge leaving check_3 (changed frequencies are highlighted with * *): check_1 (16) (8) / | eq_1~ | (8) / | / check_2 (*8*) / (8) / | /-- eq_2~ | (*0*) (back edge) | check_3 (*0*) (*0*) / | (loop exit) | (*0*) | (back edge) As a result, the loop exit edge ends up with 0 frequency which in turn makes the loop header to have maximum frequency. There are a few potential problems here: 1. The profile data seems odd. There is a single profile sample of the loop being entered. On the other hand, there are no weights inside the loop. 2. Based on static estimation we shouldn't set edges to "extreme" values, i.e. extremely likely or unlikely. 3. We shouldn't create profile metadata that is calculated from static estimation. I am not sure what policy is but it seems to make sense to treat profile metadata as something that is known to originate from profiling. Estimated probabilities should only be reflected in BPI/BFI. Any one of these would probably fix the immediate problem. I went for 3 because I think it's a good policy to have and added a FIXME about 2. Differential Revision: https://reviews.llvm.org/D24118 llvm-svn: 280713
* [Sparc][Leon] Corrected supported atomics size for processors supporting ↵Chris Dewhurst2016-09-061-1/+1
| | | | | | | | Leon CASA instruction back to 32 bits. This was erroneously checked-in for 64 bits while trying to find if there was a way to get 64 bit atomicity in Leon processors. There is not and this change should not have been checked-in. There is no unit test for this as the existing unit tests test for behaviour to 32 bits, which was the original intention of the code. llvm-svn: 280710
* [mips] Tighten FastISel restrictionsSimon Dardis2016-09-061-1/+17
| | | | | | | | | | | | | | | | | | LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower arguments assuming that it was using the paired 32bit registers to perform operations for f64. This mode of operation is not supported for MIPSR6. This patch resolves the reported issue by adding additional checks for unsupported floating point unit configuration. Thanks to mike.k for reporting this issue! Reviewers: seanbruno, vkalintiris Differential Review: https://reviews.llvm.org/D23795 llvm-svn: 280706
* [PPC] Claim stack frame before storing into it, if no red zone is presentKrzysztof Parzyszek2016-09-061-25/+91
| | | | | | | | | | | | | Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it there is no guarantee that this part of the stack will not be modified by any interrupt. To avoid this, make sure to claim the stack frame first before storing into it. This fixes https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24093 llvm-svn: 280705
* DebugInfo: use strongly typed enum for debug info flagsLeny Kholodov2016-09-065-64/+70
| | | | | | | | | | | | Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes: Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4 Flags are now strongly typed Patch by: Victor Leschuk <vleschuk@gmail.com> Differential Revision: https://reviews.llvm.org/D23766 llvm-svn: 280700
* [RegisterScavenger] Remove aliasing registers of operands from the candidate setSilviu Baranga2016-09-061-1/+2
| | | | | | | | | | | | | | | | | | | | Summary: In addition to not including the register operand of the current instruction also don't include any aliasing registers. We can't consider these as candidates because using them will clobber the corresponding register operand of the current instruction. This change doesn't include a test case and it would probably be difficult to produce a stable one since the bug depends on the results of register allocation. Reviewers: MatzeB, qcolombet, hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D24130 llvm-svn: 280698
* [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.Craig Topper2016-09-063-58/+39
| | | | | | We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type. llvm-svn: 280696
* [X86] Remove unused encoding from IntrinsicType enum.Craig Topper2016-09-062-4/+1
| | | | llvm-svn: 280694
* [X86] Fix indentation. NFCCraig Topper2016-09-061-2/+2
| | | | llvm-svn: 280693
* ARM: workaround bundled operation predicationSaleem Abdulrasool2016-09-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | This is a Windows ARM specific issue. If the code path in the if conversion ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up with a bundle to ensure that the mov.w/mov.t pair is not split up. This is normally fine, however, if the branch is also predicated, then we end up trying to predicate the bundle. For now, report a bundle as being unpredicatable. Although this is false, this would trigger a failure case previously anyways, so this is no worse. That is, there should not be any code which would previously have been if converted and predicated which would not be now. Under certain circumstances, it may be possible to "predicate the bundle". This would require scanning all bundle instructions, and ensure that the bundle contains only predicatable instructions, and converting the bundle into an IT block sequence. If the bundle is larger than the maximal IT block length (4 instructions), it would require materializing multiple IT blocks from the single bundle. llvm-svn: 280689
* Revert "DebugInfo: use strongly typed enum for debug info flags"Mehdi Amini2016-09-065-104/+101
| | | | | | This reverts commit r280686, bots are broken. llvm-svn: 280688
* [LTO] Constify (NFC)Mehdi Amini2016-09-061-16/+20
| | | | llvm-svn: 280687
* DebugInfo: use strongly typed enum for debug info flagsMehdi Amini2016-09-065-101/+104
| | | | | | | | | | | | Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes: * Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4 * Flags are now strongly typed Patch by: Victor Leschuk <vleschuk@gmail.com> Differential Revision: https://reviews.llvm.org/D23766 llvm-svn: 280686
* [AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.Craig Topper2016-09-061-1/+2
| | | | llvm-svn: 280684
* CodeGen: ensure that libcalls are always AAPCS CCSaleem Abdulrasool2016-09-061-7/+6
| | | | | | | | | | | | | All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM AAPCS VFP CC hosts. Tweak the default initialisation to ARM AAPCS CC rather than C CC for ARM/thumb targets. The changes to the tests are necessary to ensure that the calling convention for the lowered library calls are honoured. Furthermore, these adjustments cause certain branch invocations to change to branch-and-link since the returned value needs to be moved across registers (d0 -> r0, r1). llvm-svn: 280683
* [AVX-512] Teach fastisel load/store handling to use EVEX encoded ↵Craig Topper2016-09-051-42/+81
| | | | | | | | instructions for 128/256-bit vectors and scalar single/double. Still need to fix the register classes to allow the extended range of registers. llvm-svn: 280682
* [Coroutines] Part12: Handle alloca address-takenGor Nishanov2016-09-051-1/+46
| | | | | | | | | | | | | | | | | | | | | | | Summary: Move early uses of spilled variables after CoroBegin. For example, if a parameter had address taken, we may end up with the code like: define @f(i32 %n) { %n.addr = alloca i32 store %n, %n.addr ... call @coro.begin This patch fixes the problem by moving uses of spilled variables after CoroBegin. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24234 llvm-svn: 280678
* [InstCombine] don't assert that division-by-constant has been folded (PR30281)Sanjay Patel2016-09-051-7/+6
| | | | | | | | | | This is effectively a revert of: https://reviews.llvm.org/rL280115 And this should fix https://llvm.org/bugs/show_bug.cgi?id=30281: llvm-svn: 280677
* [InstCombine] revert r280637 because it causes test failures on an ARM botSanjay Patel2016-09-051-33/+43
| | | | | | http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll llvm-svn: 280676
* [AVX-512] Integrate mask register copying more completely into ↵Craig Topper2016-09-051-68/+53
| | | | | | | | X86InstrInfo::copyPhysReg and simplify. No functional change intended. The code is now written in terms of source and dest classes with feature checks inside each type of copy instead of having separate functions for each feature set. llvm-svn: 280673
* [WebAssembly] Unbreak the build.Benjamin Kramer2016-09-051-8/+9
| | | | | | Not sure why ADL isn't working here. llvm-svn: 280656
* [AMDGPU] Refactor FLAT TD instructionsValery Pykhtin2016-09-056-438/+525
| | | | | | Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655
* [Thumb1] Add relocations for fixups fixup_arm_thumb_{br,bcc}James Molloy2016-09-051-0/+6
| | | | | | | | These need to be mapped through to R_ARM_THM_JUMP{11,8} respectively. Fixes PR30279. llvm-svn: 280651
* [AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero ↵Igor Breger2016-09-051-4/+4
| | | | | | | | upper bits. Differential Revision: http://reviews.llvm.org/D23983 llvm-svn: 280650
* [X86] Make some static arrays of opcodes const and shrink to uint16_t. NFCCraig Topper2016-09-051-6/+6
| | | | llvm-svn: 280649
* [AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with ↵Craig Topper2016-09-053-33/+7
| | | | | | | | AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space. Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available. llvm-svn: 280648
* [Coroutines] Part11: Add final suspend handling.Gor Nishanov2016-09-053-17/+93
| | | | | | | | | | | | | | | | | | | Summary: A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties: * it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic; * a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic. This patch adds final suspend handling logic to CoroEarly and CoroSplit passes. Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll). Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24068 llvm-svn: 280646
* [X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their ↵Craig Topper2016-09-053-40/+0
| | | | | | | | | | placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected. The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD. I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were. llvm-svn: 280644
* [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectorsSanjay Patel2016-09-041-43/+33
| | | | | | | | The code to calculate 'UsesRemoved' could be simplified. As-is, that code is a victim of PR30273: https://llvm.org/bugs/show_bug.cgi?id=30273 llvm-svn: 280637
* [AVX-512] Add EVEX encoded scalar FMA intrinsic instructions to ↵Craig Topper2016-09-041-12/+24
| | | | | | isNonFoldablePartialRegisterLoad. llvm-svn: 280636
* [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div ↵Craig Topper2016-09-042-16/+44
| | | | | | intrinsics and upgrade to native IR. llvm-svn: 280633
* [ORC] Clone module flags metadata into the globals module in theLang Hames2016-09-041-0/+9
| | | | | | | | CompileOnDemandLayer. Also contains a tweak to the orc-lazy jit in LLI to enable the test case. llvm-svn: 280632
* [InstCombine] recode icmp fold in a vector-friendly way; NFCSanjay Patel2016-09-041-22/+30
| | | | | | | | | | | The transform in question: icmp (and (trunc W), C2), C1 -> icmp (and W, C2'), C1' ...is still not enabled for vectors, thus no functional change intended. It's not clear to me if this is a good transform for vectors or even scalars in general. Changing that behavior may be a follow-on patch. llvm-svn: 280627
* [PowerPC] During branch relaxation, recompute padding offsets before each ↵Hal Finkel2016-09-041-7/+39
| | | | | | | | | | | | | | | | iteration We used to compute the padding contributions to the block sizes during branch relaxation only at the start of the transformation. As we perform branch relaxation, we change the sizes of the blocks, and so the amount of inter-block padding might change. Accordingly, we need to recompute the (alignment-based) padding in between every iteration on our way toward the fixed point. Unfortunately, I don't have a test case (and none was provided in the bug report), and while this obviously seems needed, algorithmically, I don't have any way of generating a small and/or non-fragile regression test. llvm-svn: 280626
* revert r279960. Igor Breger2016-09-042-23/+5
| | | | | | https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625
* Strip trailing whitespaceSimon Pilgrim2016-09-041-1/+1
| | | | llvm-svn: 280623
* [LCG] Clean up and make NDEBUG verify calls more rigorous withChandler Carruth2016-09-041-32/+38
| | | | | | | | | | | make_scope_exit now that we have that utility. This makes the code much more clear and readable by isolating the check. It also makes it easy to go through and make sure all the interesting update routines have a start and end verify so we don't slowly let the graph drift into an invalid state. llvm-svn: 280619
* [LCG] A NFC refactoring to extract the logic for doingChandler Carruth2016-09-041-111/+184
| | | | | | | | | | | | | | | | | | | | | a postorder-sequence based update after edge insertion into a generic helper function. This separates the SCC-specific logic into two fairly simple lambdas and extracts the rest into a generic helper template function. I think this is a net win on its own merits because it disentangles different pieces of the algorithm. Now there is one place that does the two-step partition to identify a set of newly connected components and at the same time update the postorder sequence. However, I'm also hoping to re-use this an upcoming patch to update a cached post-order sequence of RefSCCs when doing the analogous update to the RefSCC graph, and I don't want to have two copies. The diff is quite messy but this really is just moving things around and making types generic rather than specific. llvm-svn: 280618
* [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacingDorit Nuzman2016-09-041-0/+6
| | | | | | | | | | | | memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617
* [ExecutionEngine] Move ObjectCache::anchor from MCJIT to ExecutionEngine.Lang Hames2016-09-042-2/+3
| | | | | | | | ObjectCache is an ExecutionEngine utility, so its anchor belongs there. The practical impact of this change is that ORC users no longer need to link MCJIT to use ObjectCaches. llvm-svn: 280616
* Test commit.Dorit Nuzman2016-09-041-0/+1
| | | | llvm-svn: 280615
* [PowerPC] Zero-extend constants in FastISelHal Finkel2016-09-041-1/+6
| | | | | | | | | | | | | | | | | | As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which are illegal types promoted to i32 on PowerPC, is a choice constrained by assumptions within the infrastructure. Specifically, the logic in FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI operands will be zero extended, and so, at least when materializing constants that are PHI operands, we must do the same. The rest of our fast-isel implementation does not appear to depend on the fact that we were sign-extending i8/i16 constants, and all other targets also appear to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we had been doing this only for i1 constants, and sign-extending the others). Fixes PR27721. llvm-svn: 280614
* [AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to ↵Craig Topper2016-09-042-33/+15
| | | | | | native IR. llvm-svn: 280611
* Fix inliner funclet unwind memoizationJoseph Tremoulet2016-09-041-7/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The inliner may need to determine where a given funclet unwinds to, and this determination may depend on other funclets throughout the funclet tree. The code that performs this walk in getUnwindDestToken memoizes results to avoid redundant computations. In the case that a funclet's unwind destination is derived from its ancestor, there's code to walk back down the tree from the ancestor updating the memo map of its descendants to record the unwind destination. This change fixes that code to account for the case that some descendant has a different unwind destination, which can happen if that unwind dest is a descendant of the EHPad being queried and thus didn't determine its unwind destination. Also update test inline-funclets.ll, which is supposed to cover such scenarios, to include a case that fails an assertion without this fix but passes with it. Fixes PR29151. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24117 llvm-svn: 280610
* [X86] Combine some of the strings in autoupgrade code.Craig Topper2016-09-031-35/+7
| | | | llvm-svn: 280603
OpenPOWER on IntegriCloud