| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
Subscribers: dschuff, mehdi_amini, inglorion, jgravelle-google, aheejin, sunfish, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D48689
llvm-svn: 336118
|
| |
|
|
|
|
|
|
| |
Temporary revert because of a failing test on some buildbots.
This reverts commit r336114.
llvm-svn: 336117
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48748
llvm-svn: 336116
|
| |
|
|
|
|
| |
Imply dotprod for armv8.4-a, because it is mandatory from v8.4.
llvm-svn: 336115
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].
This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.
—Prior to the patch—
- Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
- DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
- Functions receiving DT/DDT need to branch a lot which is currently necessary.
- Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
- People need to construct an additional DeferredDominance class to use functions only receiving DDT.
—After the patch—
Patch by Chijun Sima <simachijun@gmail.com>.
Reviewers: kuhar, brzycki, dmgreen, grosser, davide
Reviewed By: kuhar, brzycki
Subscribers: vsk, mgorny, llvm-commits
Author: NutshellySima
Differential Revision: https://reviews.llvm.org/D48383
llvm-svn: 336114
|
| |
|
|
|
|
| |
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.
llvm-svn: 336113
|
| |
|
|
|
|
|
|
| |
blend
We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends.
llvm-svn: 336112
|
| |
|
|
| |
llvm-svn: 336111
|
| |
|
|
| |
llvm-svn: 336110
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
iteration order deterministic. This is done as the order the blocks are
removed may affect whether or not PHI nodes in successor blocks are
removed.
For example, consider the following case where %bb1 and %bb2 are
removed:
bb1:
br i1 undef, label %bb3, label %bb4
bb2:
br i1 undef, label %bb4, label %bb3
bb3:
pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
[ pv1, %bb3 ], [ v0, %other ]
If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
The pv1 node will thus be optimized out (to v0) at the time %bb1 is
removed as a predecessor to %bb4, leaving the blocks as following when
the incoming value from %bb1 has been removed:
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
The pv2 PHI node will be optimized away by removePredecessor() as all
incoming values are identical.
In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
following when the incoming value from %bb2 to pv2 has been removed:
bb3:
pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]
The pv2 PHI node will thus not be removed in this case, ultimately
leading to the following output
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
I have not looked into changing DeleteDeadBlock() so that the redundant
PHI nodes are removed.
I have not added a test case, as I was not able to create a particularly
small and (not messy) reproducer. This is likely due to SmallPtrSet
behaving deterministically when in small mode.
Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle
Reviewed By: fhahn
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D48369
llvm-svn: 336109
|
| |
|
|
| |
llvm-svn: 336108
|
| |
|
|
|
|
|
|
| |
This was my mistake for only running test/MC/X86 and test/CodeGen/X86.
Arguably .word should be removed from this test, as it is not supported
universally.
llvm-svn: 336107
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have the following code that is uncovered with the test:
https://github.com/llvm-mirror/lld/blob/master/ELF/Target.cpp#L95
This patch:
1) Removes "!IS" check. Because at that point of execution
(we are reolving the relocations during writing output)
we should only have InputSection type of the sections in the vector.
(because we already converted MergeInputSection in mergeSections()
and combined EhInputSections in combineEhFrameSections()).
2) Covers the "!IS->getParent()" with the test.
llvm-svn: 336106
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently the llvm-exegesis native architecture is determined by comparing the
llvm native architecture with X86, so to add a new target would mean adding a
new check. Change this to building up a list of the targets llvm-exegesis
supports then using that, as this means that when adding a new target you just
add the target to the list of supported targets.
Differential Revision: https://reviews.llvm.org/D48778
llvm-svn: 336105
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.
See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.
Differential Revision: https://reviews.llvm.org/D47004
This is a fixed reland of rL336100. This should have been caught in
pre-commit testing so apologies for the noise.
llvm-svn: 336104
|
| |
|
|
|
|
| |
This was a bad change. .word == 2byte on x86.
llvm-svn: 336103
|
| |
|
|
|
|
|
|
| |
getEntryCost
This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.
llvm-svn: 336102
|
| |
|
|
|
|
|
|
|
| |
Due to current limitations in constant analysis, we need flags
on add or mul to show propagation for the potential transform
suggested in these tests (no other binops currently report
identity constants).
llvm-svn: 336101
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.
See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.
Differential Revision: https://reviews.llvm.org/D47004
llvm-svn: 336100
|
| |
|
|
|
|
|
|
|
|
|
| |
Currently the cycle counter is taken from the subtarget schedule model, which
isn't any use if the subtarget doesn't have one. Delegate the decision to the
target benchmark runner, as it may know better what to do in that case, with
the default being the current behaviour.
Differential Revision: https://reviews.llvm.org/D48779
llvm-svn: 336099
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.
Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 336098
|
| |
|
|
|
|
|
|
|
|
| |
MSVC limits char16_t and char32_t string literal names to 32 bytes of character
data, not to 32 characters. wchar_t string literal names on the other hand can
get up to 64 bytes of character data.
https://reviews.llvm.org/D48781
llvm-svn: 336097
|
| |
|
|
|
|
| |
This is another pattern mentioned in PR37806.
llvm-svn: 336096
|
| |
|
|
|
|
|
|
|
|
| |
handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.
This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...
llvm-svn: 336095
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This provides more structured information that embedders can use for rendering.
ClangdLSPServer continues to call render(), so NFC.
The patch is:
- trivial changes to ClangdServer/ClangdLSPServer
- mostly-mechanical updates to CodeCompleteTests etc for the new API
- new direct tests of render() in CodeCompleteTests
- tiny cleanups to CodeCompletionItem (operator<< and missing initializers)
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48821
llvm-svn: 336094
|
| |
|
|
|
|
| |
It duplicated the default implementation.
llvm-svn: 336093
|
| |
|
|
|
|
|
|
| |
getEntryCost/vectorizeTree. NFCI.
Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.
llvm-svn: 336092
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increments/decrements the result with the number of active bits
from the predicate.
The inc/dec variants added are:
- incp x0, p0.h (scalar)
- incp z0.h, p0 (vector)
The unsigned saturating inc/dec variants added are:
- uqincp x0, p0.h (scalar)
- uqincp w0, p0.h (scalar, 32bit)
- uqincp z0.h, p0 (vector)
The signed saturating inc/dec variants added are:
- sqincp x0, p0.h (scalar)
- sqincp x0, p0.h, w0 (scalar, 32bit)
- sqincp z0.h, p0 (vector)
llvm-svn: 336091
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increment/decrement vector by multiple of predicate constraint
element count.
The variants added by this patch are:
- INCH, INCW, INC
and (saturating):
- SQINCH, SQINCW, SQINCD
- UQINCH, UQINCW, UQINCW
- SQDECH, SQINCW, SQINCD
- UQDECH, UQINCW, UQINCW
For example:
incw z0.s, all, mul #4
llvm-svn: 336090
|
| |
|
|
|
|
|
|
| |
llvm-exegesis uop testing
We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources.
llvm-svn: 336089
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When compiling with icc, there is a problem with reenter frame addresses in
parallel_begin callbacks in the interoperability.c testcase. (The address is
not available. thus NULL)
Using alloca() forces availability of the frame pointer.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48282
llvm-svn: 336088
|
| |
|
|
|
|
|
|
|
|
|
| |
Several runtime entry points have not been tested from non-OpenMP threads. This
adds tests to an existing testcase. While at it, the testcase was reformatted
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48124
llvm-svn: 336087
|
| |
|
|
|
|
|
|
|
|
|
| |
Especially the thread_end callback has not been tested before.
This adds a testcase for nested and non-nested threads.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D47824
llvm-svn: 336086
|
| |
|
|
|
|
|
|
|
| |
The current implementation always provides the thread-num for the current
parallel region. This patch fixes the behavior for ancestor levels >0.
Differential Revision: https://reviews.llvm.org/D46533
llvm-svn: 336085
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes the issue that arises when we duplicate condition from
the predecessor block. If the condition's arguments are not considered alive
across the blocks, fast regalloc gets confused and starts generating reloads
from the slots that have never been spilled to. This change also leads to
smaller code given that, unlike on architectures with condition codes, on
Mips we can branch directly on register value, thus we gain nothing by
duplication.
Patch by Dragan Mladjenovic.
Differential Revision: https://reviews.llvm.org/D48642
llvm-svn: 336084
|
| |
|
|
| |
llvm-svn: 336083
|
| |
|
|
|
|
|
|
| |
This is followup for r335958.
Thanks to Rui for noticing.
llvm-svn: 336082
|
| |
|
|
|
|
|
|
| |
Compare vector elements with a signed/unsigned immediate, e.g.
cmpgt p0.s, p0/z, z0.s, #-16
cmphi p0.s, p0/z, z0.s, #127
llvm-svn: 336081
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch changes the return types for ocl_get_* functions during SPIR code generation. Because these functions return size_t types, the return type needs to be changed to the actual size of size_t on the device.
Based on work by Michal Babej and Pekka Jääskeläinen
Patch by: Alain Denzler
Reviewers: grosser, philip.pfaffe, bollu
Reviewed By: grosser, philip.pfaffe
Subscribers: nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D48774
llvm-svn: 336080
|
| |
|
|
|
|
|
|
|
| |
These patches were previously reverted as they led to
buildbot time-outs caused by large switch statement in
printAliasInstr when using UBSan and O3. The issue has
been addressed with a workaround (r335525).
llvm-svn: 336079
|
| |
|
|
| |
llvm-svn: 336078
|
| |
|
|
|
|
|
|
| |
compact and make it easier to see the similarities. NFC
It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here.
llvm-svn: 336077
|
| |
|
|
| |
llvm-svn: 336076
|
| |
|
|
|
|
|
|
|
|
| |
search.
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.
Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.
llvm-svn: 336075
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
multiple for i64 pre-inc load/store
For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.
unsigned long long result = 0;
unsigned long long foo(char *p, unsigned long long n) {
for (unsigned long long i = 0; i < n; i++) {
unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
result *= x1 * x2 * x3 * x4;
}
return result;
}
Patch by jedilyn(Kewen Lin).
Differential Revision: https://reviews.llvm.org/D48813
--This line, and those below, will be ignored--
M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
A test/CodeGen/PowerPC/preincprep-i64-check.ll
llvm-svn: 336074
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
|
| |
|
|
|
|
|
|
|
|
| |
section flags in the ELF assembler. This matches the defaults
given in the rest of MC.
Fixes PR37997 where we couldn't assemble our own assembly output
without warnings.
llvm-svn: 336072
|
| |
|
|
|
|
|
|
| |
clang intrinsic names.
I thought I fixed these yesterday, but I guess I missed a few.
llvm-svn: 336071
|
| |
|
|
|
|
|
|
| |
X86InstrInfo::commuteInstructionImpl.
findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl.
llvm-svn: 336070
|
| |
|
|
|
|
| |
instead of an opcode. NFCI.
llvm-svn: 336069
|