| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
x86_sse42_crc32_32_8 and was not mapped to a clang builtin. I'm not even sure why this form of the instruction is even called out explicitly in the docs. Also add AutoUpgrade support to convert it into the other intrinsic with appropriate trunc and zext.
llvm-svn: 192672
|
| |
|
|
| |
llvm-svn: 191927
|
| |
|
|
|
|
|
|
|
|
| |
The heuristic was added to avoid spending too much compile time A specially
crafted test case (PR17461, PR16474) with many uses on a select or bitcast
instruction can still trigger the slow case. Add a check for that case.
This only affects compile time, don't have a good way to test it.
llvm-svn: 191896
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
infrastructure.
This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.
Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.
llvm-svn: 191835
|
| |
|
|
|
|
| |
Patch by Alp Toker.
llvm-svn: 191757
|
| |
|
|
|
|
| |
PR17425.
llvm-svn: 191741
|
| |
|
|
| |
llvm-svn: 191675
|
| |
|
|
|
|
|
|
| |
cyclic GEP.
Those can occur in dead code. PR17402.
llvm-svn: 191644
|
| |
|
|
| |
llvm-svn: 191585
|
| |
|
|
| |
llvm-svn: 191579
|
| |
|
|
| |
llvm-svn: 191574
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the command line argument "struct-path-tbaa" since we should not depend
on command line argument to decide which format the IR file is using. Instead,
we check the first operand of the tbaa tag node, if it is a MDNode, we treat
it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA
format.
When clang starts to use struct-path aware TBAA format no matter whether
struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support
for scalar TBAA format can be dropped.
Existing testing cases are updated to use the struct-path aware TBAA format.
llvm-svn: 191538
|
| |
|
|
|
|
|
| |
This code isn't ready to deal with allocation functions where the return is not
the allocated pointer. The checks below will reject posix_memalign anyways.
llvm-svn: 191319
|
| |
|
|
| |
llvm-svn: 191315
|
| |
|
|
|
|
| |
We really don't want to optimize malloc return value checks away.
llvm-svn: 191313
|
| |
|
|
|
|
|
|
|
|
|
|
| |
NULL.
This is safe per C++11 18.6.1.1p3: [operator new returns] a non-null pointer to
suitably aligned storage (3.7.4), or else throw a bad_alloc exception. This
requirement is binding on a replacement version of this function.
Brings us a tiny bit closer to eliminating more vector push_backs.
llvm-svn: 191310
|
| |
|
|
|
|
|
|
|
| |
Overflow doesn't affect the correctness of equalities. Computing this is cheap,
we just reuse the computation for the inbounds case and try to peel of more
non-inbounds GEPs. This pattern is unlikely to ever appear in code generated by
Clang, but SCEV occasionally produces it.
llvm-svn: 191200
|
| |
|
|
|
|
|
|
| |
If address space 0 was smaller than the address space
in a constant inttoptr/ptrtoint pair, the wrong mask size
would be used.
llvm-svn: 190899
|
| |
|
|
| |
llvm-svn: 190886
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.
We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:
(v0, v1, v2, v3)
\ \ / /
\ \ /
+ +
(v0+v2, v1+v3, undef, undef)
\ /
((v0+v2) + (v1+v3), undef, undef)
%rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
%r = extractelement <4 x float> %bin.rdx8, i32 0
This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.
We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).
(v0, v1, v2, v3)
\ / \ /
(v0+v1, v2+v3, undef, undef)
\ /
((v0+v1)+(v2+v3), undef, undef, undef)
%rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
%r = extractelement <4 x float> %bin.rdx.1, i32 0
llvm-svn: 190876
|
| |
|
|
|
|
| |
a volatile load, or a volatile store.
llvm-svn: 190631
|
| |
|
|
| |
llvm-svn: 190567
|
| |
|
|
|
|
|
|
|
| |
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.
llvm-svn: 190542
|
| |
|
|
| |
llvm-svn: 190425
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
instead of having its own implementation.
The implementation of isTBAAVtableAccess is in TypeBasedAliasAnalysis.cpp
since it is related to the format of TBAA metadata.
The path for struct-path tbaa will be exercised by
test/Instrumentation/ThreadSanitizer/read_from_global.ll, vptr_read.ll, and
vptr_update.ll when struct-path tbaa is on by default.
llvm-svn: 190216
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revert unintentional commit (of an unreviewed change).
Original commit message:
Add getUnrollingPreferences to TTI
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.
llvm-svn: 189566
|
| |
|
|
|
|
|
|
|
| |
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.
llvm-svn: 189565
|
| |
|
|
| |
llvm-svn: 189527
|
| |
|
|
| |
llvm-svn: 189290
|
| |
|
|
| |
llvm-svn: 189173
|
| |
|
|
|
|
|
|
|
|
| |
...so that it can be used for z too. Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.
The pass is opt-in because at the moment it only handles sqrt.
llvm-svn: 189097
|
| |
|
|
| |
llvm-svn: 188932
|
| |
|
|
| |
llvm-svn: 188844
|
| |
|
|
| |
llvm-svn: 188831
|
| |
|
|
|
|
|
|
| |
Also fix it calculating the wrong value. The struct index
is not a ConstantInt, so it was being interpreted as an array
index.
llvm-svn: 188713
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes SCEVExpander so that it does not create multiple distinct induction
variables for duplicate PHI entries. Specifically, given some code like this:
do.body6: ; preds = %do.body6, %do.body6, %if.then5
%end.0 = phi i8* [ undef, %if.then5 ], [ %incdec.ptr, %do.body6 ], [ %incdec.ptr, %do.body6 ]
...
Note that it is legal to have multiple entries for a basic block so long as the
associated value is the same. So the above input is okay, but expanding an
AddRec in this loop could produce code like this:
do.body6: ; preds = %do.body6, %do.body6, %if.then5
%indvar = phi i64 [ %indvar.next, %do.body6 ], [ %indvar.next1, %do.body6 ], [ 0, %if.then5 ]
%end.0 = phi i8* [ undef, %if.then5 ], [ %incdec.ptr, %do.body6 ], [ %incdec.ptr, %do.body6 ]
...
%indvar.next = add i64 %indvar, 1
%indvar.next1 = add i64 %indvar, 1
And this is not legal because there are two PHI entries for %do.body6 each with
a distinct value.
Unfortunately, I don't have an in-tree test case.
llvm-svn: 188614
|
| |
|
|
|
|
|
|
|
|
|
|
| |
to find loops if the From and To instructions were in the same block.
Refactor the code a little now that we need to fill to start the CFG-walking
algorithm with more than one starting basic block sometimes.
Special thanks to Andrew Trick for catching an error in my understanding of
natural loops in code review.
llvm-svn: 188236
|
| |
|
|
|
|
|
| |
e.g. Use Ty->getPointerElementType()
instead of cast<PointerType>(Ty)->getElementType()
llvm-svn: 188223
|
| |
|
|
| |
llvm-svn: 188219
|
| |
|
|
| |
llvm-svn: 188140
|
| |
|
|
|
|
|
| |
Inlining between functions with different values of sanitize_* attributes
leads to over- or under-sanitizing, which is always bad.
llvm-svn: 187967
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.
For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).
This will be used by the PowerPC backend in a follow-up commit.
llvm-svn: 187926
|
| |
|
|
| |
llvm-svn: 187806
|
| |
|
|
|
|
| |
Remove assertion that the verifier should catch.
llvm-svn: 187692
|
| |
|
|
| |
llvm-svn: 187635
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This fix is very lightweight. The same fix already existed for AddRec
but was missing for NAry expressions.
This is obviously an improvement and I'm unsure how to test compile
time problems.
Patch by Xiaoyi Guo!
llvm-svn: 187475
|
| |
|
|
|
|
|
|
|
| |
instructions
Call into ComputeMaskedBits to figure out which bits are set on both add
operands and determine if the value is a power-of-two-or-zero or not.
llvm-svn: 187445
|
| |
|
|
| |
llvm-svn: 187284
|
| |
|
|
|
|
|
|
|
|
| |
Adds unit tests for it too.
Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.
llvm-svn: 187283
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
|