| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Constant merge can merge a constant with implicit alignment with one that has
explicit alignment. Before this change it was assuming that the explicit
alignment was higher than the implicit one, causing the result to be under
aligned in some cases.
Fixes pr17815.
Patch by Chris Smowton!
llvm-svn: 194506
|
| |
|
|
|
|
| |
Both simpler and more powerful than the hand-rolled folding logic.
llvm-svn: 194475
|
| |
|
|
| |
llvm-svn: 194457
|
| |
|
|
|
|
| |
Also updated test files that were generated from this change.
llvm-svn: 194453
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The symptom is that an assertion is triggered. The assertion was added by
me to detect the situation when value is propagated from dead blocks.
(We can certainly get rid of assertion; it is safe to do so, because propagating
value from dead block to alive join node is certainly ok.)
The root cause of this bug is : edge-splitting is conducted on the fly,
the edge being split could be a dead edge, therefore the block that
split the critial edge needs to be flagged "dead" as well.
There are 3 ways to fix this bug:
1) Get rid of the assertion as I mentioned eariler
2) When an dead edge is split, flag the inserted block "dead".
3) proactively split the critical edges connecting dead and live blocks when
new dead blocks are revealed.
This fix go for 3) with additional 2 LOC.
Testing case was added by Rafael the other day.
llvm-svn: 194424
|
| |
|
|
|
|
| |
No functional change, just better reporting.
llvm-svn: 194388
|
| |
|
|
| |
llvm-svn: 194374
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
a fix to PR17307 & 17308."
This causes PR17852.
This reverts commit d93e8a06b2ca09ab18f390cd514b7443e2e571f7.
Conflicts:
test/Transforms/GVN/cond_br2.ll
llvm-svn: 194348
|
| |
|
|
|
|
|
|
| |
This should be inconsequential and is work
towards removing the default address space
arguments.
llvm-svn: 194347
|
| |
|
|
|
|
|
|
| |
it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite.
llvm-svn: 194346
|
| |
|
|
| |
llvm-svn: 194342
|
| |
|
|
|
|
|
|
|
|
|
|
| |
LoopUnswitch's code simplification routine has logic to convert conditional
branches into unconditional branches, after unswitching makes the condition
constant, and then remove any blocks that renders dead. Unfortunately, this
code is dead, currently broken, and furthermore, has never been alive (at least
as far back at 2006).
No functionality change intended.
llvm-svn: 194277
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
conditional check + fail.
Due to the previously added overflow checks, we can have a retain/release
relation that is one directional. This occurs specifically when we run into an
additive overflow causing us to drop state in only one direction. If that
occurs, we should bail and not optimize that retain/release instead of
asserting.
Apologies for the size of the testcase. It is necessary to cause the additive
cfg overflow to trigger.
rdar://15377890
llvm-svn: 194083
|
| |
|
|
|
|
|
|
|
|
|
| |
As with the other loop unrolling parameters (the unrolling threshold, partial
unrolling, etc.) runtime unrolling can now also be controlled via the
constructor. This will be necessary for moving non-trivial unrolling late in
the pass manager (after loop vectorization).
No functionality change intended.
llvm-svn: 194027
|
| |
|
|
| |
llvm-svn: 194017
|
| |
|
|
|
|
|
|
| |
strict weak ordering.
STL debug mode checks this.
llvm-svn: 194015
|
| |
|
|
|
|
|
|
| |
When the elements are extracted from a select on vectors
or a vector select, do the select on the extracted scalars
from the input if there is only one use.
llvm-svn: 194013
|
| |
|
|
| |
llvm-svn: 193958
|
| |
|
|
|
|
|
|
|
|
|
| |
Instead of doing a RPO traversal of the whole function remember the blocks
containing gathers (typically <= 2) and scan them in dominator-first order.
The actual CSE is still quadratic, but I'm not confident that adding a
scoped hash table here is worth it as we're only looking at the generated
instructions and not arbitrary code.
llvm-svn: 193956
|
| |
|
|
|
|
|
|
| |
This reverts commit r193356, it caused PR17781.
A reduced test case covering this regression has been added to the test suite.
llvm-svn: 193955
|
| |
|
|
| |
llvm-svn: 193954
|
| |
|
|
|
|
|
|
|
|
| |
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.
Patch by Tim Northover
llvm-svn: 193943
|
| |
|
|
| |
llvm-svn: 193927
|
| |
|
|
|
|
|
|
| |
Doing this with a hash map doesn't change behavior and avoids calling
isIdenticalTo O(n^2) times. This should probably eventually move into a utility
class shared with EarlyCSE and the limited CSE in the SLPVectorizer.
llvm-svn: 193926
|
| |
|
|
| |
llvm-svn: 193895
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.
On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.
This should also help lpbench and paq8p.
I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.
radar://15339680
llvm-svn: 193891
|
| |
|
|
|
|
|
| |
If we have a pointer to a single-element struct we can still build wide loads
and stores to it (if there is no padding).
llvm-svn: 193860
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When a dependence check fails we can still try to vectorize loops with runtime
array bounds checks.
This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the
scalar performance on a corei7-avx.
radar://15339680
llvm-svn: 193853
|
| |
|
|
|
|
|
|
| |
Clear all data structures when resetting the RuntimeCheck data structure.
No test case. This was exposed by an upcomming change.
llvm-svn: 193852
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given that backend does not handle "invoke asm" correctly ("invoke asm" will be
handled by SelectionDAGBuilder::visitInlineAsm, which does not have the right
setup for LPadToCallSiteMap) and we already made the assumption that inline asm
does not throw in InstCombiner::visitCallSite, we are going to make the same
assumption in Inliner to make sure we don't convert "call asm" to "invoke asm".
If it becomes necessary to add support for "invoke asm" later on, we will need
to modify the backend as well as remove the assumptions that inline asm does
not throw.
Fix rdar://15317907
llvm-svn: 193808
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
files.
* The linker tells LLVM which symbols are not used from other object files,
but will be put in the dso symbol table if present.
GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.
LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.
This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.
llvm-svn: 193800
|
| |
|
|
| |
llvm-svn: 193734
|
| |
|
|
| |
llvm-svn: 193720
|
| |
|
|
| |
llvm-svn: 193710
|
| |
|
|
|
|
|
|
|
| |
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.
radar://15336950
llvm-svn: 193573
|
| |
|
|
|
|
|
|
|
| |
No test case, because with the current cost model we don't see a difference.
An upcoming ARM memory cost model change will expose and test this bug.
radar://15332579
llvm-svn: 193572
|
| |
|
|
|
|
| |
indirect memops.
llvm-svn: 193489
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements quick look-up for block in loop by maintaining a hash set for blocks.
It improves the efficiency of loop analysis a lot, the biggest improvement could be 5-6%(458.sjeng).
Below are the compilation time for our benchmark in llc before & after the patch.
Benchmark llc - trunk llc - patched
401.bzip2 0.339081 100.00% 0.329657 102.86%
403.gcc 19.853966 100.00% 19.605466 101.27%
429.mcf 0.049823 100.00% 0.048451 102.83%
433.milc 0.514898 100.00% 0.510217 100.92%
444.namd 1.109328 100.00% 1.103481 100.53%
445.gobmk 4.988028 100.00% 4.929114 101.20%
456.hmmer 0.843871 100.00% 0.825865 102.18%
458.sjeng 0.754238 100.00% 0.714095 105.62%
464.h264ref 2.9668 100.00% 2.90612 102.09%
471.omnetpp 4.556533 100.00% 4.511886 100.99%
bitmnp01 0.038168 100.00% 0.0357 106.91%
idctrn01 0.037745 100.00% 0.037332 101.11%
libquake2 3.78689 100.00% 3.76209 100.66%
libquake_ 2.251525 100.00% 2.234104 100.78%
linpack 0.033159 100.00% 0.032788 101.13%
matrix01 0.045319 100.00% 0.043497 104.19%
nbench 0.333161 100.00% 0.329799 101.02%
tblook01 0.017863 100.00% 0.017666 101.12%
ttsprk01 0.054337 100.00% 0.053057 102.41%
Reviewer : Andrew Trick <atrick@apple.com>, Hal Finkel <hfinkel@anl.gov>
Approver : Andrew Trick <atrick@apple.com>
Test : Pass make check-all & llvm test-suite
llvm-svn: 193460
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.
llvm-svn: 193438
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch teaches GlobalStatus to analyze a call that uses the global value as
a callee, not as an argument.
With this change internalize call handle the common use of linkonce_odr
functions. This reduces the number of linkonce_odr functions in a LTO build of
clang (checked with the emit-llvm gold plugin option) from 1730 to 60.
llvm-svn: 193436
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:
%58 = extractelement <2 x i32> %15, i32 0
%59 = extractelement i32 %58, i32 0
where the second extractelement is illegal because its first operand is not a vector.
llvm-svn: 193434
|
| |
|
|
|
|
| |
Patch by: Vincent Lejeune
llvm-svn: 193356
|
| |
|
|
|
|
|
|
| |
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.
llvm-svn: 193349
|
| |
|
|
|
|
| |
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object
llvm-svn: 193317
|
| |
|
|
|
|
| |
Reviewed by Andy
llvm-svn: 193303
|
| |
|
|
| |
llvm-svn: 193292
|
| |
|
|
| |
llvm-svn: 193268
|
| |
|
|
|
|
|
|
|
|
| |
Major steps include:
1). introduces a not-addr-taken bit-field in GlobalVariable
2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable
dosen't have its address taken.
3). AA use this info for disambiguation.
llvm-svn: 193251
|
| |
|
|
| |
llvm-svn: 193130
|
| |
|
|
|
|
|
| |
v2:
- Use CI->cannotDuplicate()
llvm-svn: 193115
|