| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Mem2Reg shouldn't be optimizing a function that is marked
optnone. There is a test checking this that fails when mem2reg is
explicitly added to the standard pass pipeline.
llvm-svn: 255336
|
| |
|
|
|
|
| |
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.
llvm-svn: 255334
|
| |
|
|
| |
llvm-svn: 255331
|
| |
|
|
| |
llvm-svn: 255330
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, each function's on-disk VP data is 'pointed'
to by the Value field of per-function ProfileData structue, and
read relies on this field (relocated with ValueDataDelta field)
to read the value data. However this means the Value field needs
to be updated during runtime before dumping, which creates undesirable
data races.
With this patch, the reading of VP data no longer depends on Value
field. There is no format change. ValueDataDelta header field becomes
obsolute but will be kept for compatibility reason (will be removed
next time the raw format change is needed).
llvm-svn: 255329
|
| |
|
|
| |
llvm-svn: 255322
|
| |
|
|
| |
llvm-svn: 255321
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reduce memory usage.
Previously, LazyValueInfoCache inserted overdefined lattice values into
both ValueCache and OverDefinedCache. This wasn't necessary and was
causing LazyValueInfo to use an excessive amount of memory in some cases.
This patch changes LazyValueInfoCache to insert overdefined values only
into OverDefinedCache. The memory usage decreases by 70 to 75% when one
of the files in llvm is compiled.
rdar://problem/11388615
Differential revision: http://reviews.llvm.org/D15391
llvm-svn: 255320
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
addis 3, 2, b4v@toc@ha
addi 4, 3, b4v@toc@l
lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
lbz 6, 1(4) ; optimizer
lbz 7, 2(4)
lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
addis 3, 2, b4v@toc@ha
lbz 4, b4v@toc@l(3)
lbz 5, b4v@toc@l+1(3)
lbz 6, b4v@toc@l+2(3)
lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.
llvm-svn: 255319
|
| |
|
|
| |
llvm-svn: 255318
|
| |
|
|
| |
llvm-svn: 255317
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously in the conversion cost table there are no entries for integer-integer
conversions on SSE2. This will result in imprecise costs for certain vectorized
operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers
are counted from the result of running llc on the new test case in this patch.
Differential revision: http://reviews.llvm.org/D15132
llvm-svn: 255315
|
| |
|
|
| |
llvm-svn: 255313
|
| |
|
|
| |
llvm-svn: 255306
|
| |
|
|
|
|
|
|
|
|
|
|
| |
x)) combines for ppc_fp128, since signbit computation is more
complicated.
Discussion thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html
Patch by Tim Shen!
llvm-svn: 255305
|
| |
|
|
| |
llvm-svn: 255304
|
| |
|
|
| |
llvm-svn: 255303
|
| |
|
|
|
|
| |
C API.
llvm-svn: 255302
|
| |
|
|
| |
llvm-svn: 255301
|
| |
|
|
|
|
| |
on the llvm mailing lists.
llvm-svn: 255300
|
| |
|
|
|
|
|
|
|
| |
This was causing bad code gen and assembly that won't assemble, as
mixed altivec and vsx code would end up with a vsx high register
assigned to an altivec instruction, which won't work. Constraining the
classes allows the optimization to proceed.
llvm-svn: 255299
|
| |
|
|
|
|
|
|
| |
-fprofile-instr-generate
This is the first step in supporting PGO data generation via CMake. I've marked the option as advanced and experimental until it is fleshed out further.
llvm-svn: 255298
|
| |
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D15339
done
llvm-svn: 255296
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15371
llvm-svn: 255295
|
| |
|
|
| |
llvm-svn: 255292
|
| |
|
|
| |
llvm-svn: 255291
|
| |
|
|
|
|
| |
PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code.
llvm-svn: 255289
|
| |
|
|
|
|
| |
I see a few bots timing out, so I'm speculatively disabling r255247.
llvm-svn: 255286
|
| |
|
|
| |
llvm-svn: 255272
|
| |
|
|
|
|
| |
Using %t rather %T/<specific_name> as the temporary profdata filename.
llvm-svn: 255271
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid O(N^2) behaviour when checking for bad bitcasts in `ConstantExpr`s
buried inside of aggregate initializers to `GlobalVariable`s. I've:
- centralized the "visited" set for recursing through `ConstantExpr`s so
that expressions are only visited once per Verifier run,
- removed the duplicate logic for the stack visit, and
- avoided recursing into other `GlobalValue`s.
This recovers roughly a 100x time difference in clang compiles of a
particular input file (filled with large cross-referencing tables) that
depends on whether `-disable-llvm-verifier` is on. This slowdown was
caused by r187506, which introduced these checks.
Now, avoiding `-disable-llvm-verifier` only causes a 2x slowdown for
this case.
(Interestingly, dumping the textual IR for this file starts at least
50GB of global variable initializers (I don't know the total, since I
killed the dump)...)
llvm-svn: 255269
|
| |
|
|
| |
llvm-svn: 255265
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds support for in-memory round-trip of sample profile data along with basic
round trip unit tests. This will also make it easier to include unit tests for
future changes to sample profiling.
Reviewers: davidxl, dnovillo, silvas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15211
llvm-svn: 255264
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Convert f16 vectors to corresponding f32 vectors before doing the
conversion to int.
Add tests for v4f16, v8f16.
Reviewers: ab, jmolloy
Subscribers: llvm-commits, srhines
Differential Revision: http://reviews.llvm.org/D14936
llvm-svn: 255263
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a redo of r255137 (reverted at r255227) which was a redo of
r255124 (reverted at r255126) with a fixed check for a scalar source
type and an added test for the failure that caused the revert.
Original commit message:
Example:
bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
--->
extractelement <2 x float> %X, i32 1
This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)
Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232
with 2 less specific transforms to catch the case in the bug report.
Differential Revision: http://reviews.llvm.org/D14879
llvm-svn: 255261
|
| |
|
|
|
|
|
|
| |
Added some missing spaces between the module identifier and the start of
the debug message. Also added a ":" after the module identifier to make
this look a little nicer.
llvm-svn: 255259
|
| |
|
|
|
|
| |
Found by ubsan.
llvm-svn: 255258
|
| |
|
|
| |
llvm-svn: 255257
|
| |
|
|
|
|
|
| |
Ensure we release the files even when they don't hold a function index
summary section, by restructuring the control flow a little bit.
llvm-svn: 255256
|
| |
|
|
| |
llvm-svn: 255255
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A linker normally has two stages: symbol resolution and "moving stuff".
In lib/Linker there is the complication of lazy linking some globals,
but it was still far more mixed than it needed to.
This splits the linker into a lower level IRMover and the linker proper.
The IRMover just takes a list of globals to move and a callback that
lets the user control what is lazy linked.
The main motivation is that now tools/gold (and soon lld) can use their
own symbol resolution to instruct IRMover what to do.
llvm-svn: 255254
|
| |
|
|
|
|
| |
change.
llvm-svn: 255253
|
| |
|
|
|
|
| |
changes.
llvm-svn: 255252
|
| |
|
|
| |
llvm-svn: 255251
|
| |
|
|
| |
llvm-svn: 255250
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We extend the search for redundant stores to predecessor blocks that
unconditionally lead to the block BB with the current store instruction. That
also includes single-block loops that unconditionally lead to BB, and
if-then-else blocks where then- and else-blocks unconditionally lead to BB.
http://reviews.llvm.org/D13363
Patch by Ivan Baev <ibaev@codeaurora.org>!
llvm-svn: 255247
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D15286
LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.
llvm-svn: 255246
|
| |
|
|
|
|
|
|
| |
Introduced DIMacro and DIMacroFile debug info metadata in the LLVM IR to support macros.
Differential Revision: http://reviews.llvm.org/D14687
llvm-svn: 255245
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
LAA uses the PredicatedScalarEvolution interface, so it can produce
forward/backward dependences having SCEVs that are AddRecExprs only after being
transformed by PredicatedScalarEvolution.
Use PredicatedScalarEvolution to get the expected expressions.
Reviewers: anemet
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15382
llvm-svn: 255241
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
SystemZ needs to do its scheduling after branch relaxation, which can
only happen after block placement, and therefore the standard
PostRAScheduler point in the pass sequence is too early.
TargetMachine::targetSchedulesPostRAScheduling() is a new method that
signals on returning true that target will insert the final scheduling
pass on its own.
Reviewed by Hal Finkel
llvm-svn: 255234
|