| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
passed to it. Delete it on error or when we create an interpreter that doesn't need it.
llvm-svn: 154288
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.
No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.
This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.
llvm-svn: 154287
|
| |
|
|
|
|
| |
debugging.
llvm-svn: 154286
|
| |
|
|
|
|
|
|
|
|
|
|
| |
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).
llvm-svn: 154285
|
| |
|
|
|
|
|
|
| |
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.
llvm-svn: 154284
|
| |
|
|
|
|
|
|
|
|
|
|
| |
An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.
This saves a *lot* of space during LTO with -O0 -g flags.
llvm-svn: 154280
|
| |
|
|
|
|
|
| |
value pointer by making the value pointer into a pointer-int pair with 2 bits
available for flags.
llvm-svn: 154279
|
| |
|
|
|
|
| |
remove patterns for selecting the intrinsic. Similar was already done for avx1.
llvm-svn: 154272
|
| |
|
|
|
|
| |
AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.
llvm-svn: 154268
|
| |
|
|
| |
llvm-svn: 154267
|
| |
|
|
|
|
|
|
|
|
| |
shuffle node because it could introduce new shuffle nodes that were not
supported efficiently by the target.
2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
second shuffle reverses the transformation of the first shuffle.
llvm-svn: 154266
|
| |
|
|
|
|
|
|
| |
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
llvm-svn: 154265
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.
This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.
llvm-svn: 154263
|
| |
|
|
|
|
|
|
| |
but not NSW.
Found by inspection.
llvm-svn: 154262
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tLDRr instruction with the last register operand set to the zero register
prints in assembly as if no register was specified, and the assembler encodes
it as a tLDRi instruction with a zero immediate. With the integrated assembler,
that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
is broken. Emit the instruction as tLDRi with a zero immediate. I don't
know if there's a good way to write a testcase for this. Suggestions welcome.
Opportunities for follow-up work:
1) The asm printer should complain if a non-optional register operand is set
to the zero register, instead of silently dropping it.
2) The integrated assembler should complain in the same situation, instead of
silently emitting the operand as "r0".
llvm-svn: 154261
|
| |
|
|
| |
llvm-svn: 154249
|
| |
|
|
|
|
|
| |
Cygwin-1.7 supports dw2. Some recent mingw distros support one, too.
I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin.
llvm-svn: 154247
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
by default.
This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.
I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.
llvm-svn: 154235
|
| |
|
|
| |
llvm-svn: 154226
|
| |
|
|
| |
llvm-svn: 154210
|
| |
|
|
|
|
|
| |
After register masks were introdruced to represent the call clobbers, it
is no longer necessary to have duplicate instruction for iOS.
llvm-svn: 154209
|
| |
|
|
|
|
| |
which exists for this purpose.
llvm-svn: 154199
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
disassembler requires a MCSubtargetInfo and a
MCInstrInfo to exist in order to initialize the
instruction printer and disassembler; however,
although the printer and disassembler keep
references to these objects they do not own them.
Previously, the MCSubtargetInfo and MCInstrInfo
objects were just leaked.
I have extended LLVMDisasmContext to own these
objects and delete them when it is destroyed.
llvm-svn: 154192
|
| |
|
|
|
|
|
| |
ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.
llvm-svn: 154183
|
| |
|
|
|
|
|
|
| |
parameter until we have a more sensible API for doing the same thing.
Reviewed by Chandler.
llvm-svn: 154180
|
| |
|
|
|
|
|
|
|
|
|
| |
simplification has been performed. This is a bit less efficient
(requires another ilist walk of the basic blocks) but shouldn't matter
in practice. More importantly, it's just too much work to keep track of
all the various ways the return instructions can be mutated while
simplifying them. This fixes yet another crasher, reported by Daniel
Dunbar.
llvm-svn: 154179
|
| |
|
|
|
|
| |
The modifications are a lot more trivial than they appear to be in the diff!
llvm-svn: 154174
|
| |
|
|
| |
llvm-svn: 154171
|
| |
|
|
|
|
| |
a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.
llvm-svn: 154166
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
dead code, including dead return instructions in some cases. Otherwise,
we end up having a bogus poniter to a return instruction that blows up
much further down the road.
It turns out that this pattern is both simpler to code, easier to update
in the face of enhancements to the inliner cleanup, and likely cheaper
given that it won't add dead instructions to the list.
Thanks to John Regehr's numerous test cases for teasing this out.
llvm-svn: 154157
|
| |
|
|
|
|
|
|
| |
We had special instructions for iOS because r9 is call-clobbered, but
that is represented dynamically by the register mask operands now, so
there is no need for the pseudo-instructions.
llvm-svn: 154144
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The load/store optimizer splits LDRD/STRD into two instructions when the
register pairing doesn't work out. For negative offsets in Thumb2, it uses
t2STRi8 to do that. That's fine, except for the case when the offset is in
the range [-4,-1]. In that case, we'll also form a second t2STRi8 with
the original offset plus 4, resulting in a t2STRi8 with a non-negative
offset, which ends up as if it were an STRT, which is completely bogus.
Similarly for loads.
No testcase, unfortunately, as any I've been able to construct is both large
and extremely fragile.
rdar://11193937
llvm-svn: 154141
|
| |
|
|
|
|
|
|
|
|
| |
'add r2, #-1024' should just use 'sub r2, #1024' rather than erroring out.
Thumb1 aliases for adding a negative immediate to the stack pointer,
also.
rdar://11192734
llvm-svn: 154123
|
| |
|
|
|
|
|
|
|
| |
This enables debuggers to see what are interesting lines for a
breakpoint rather than any line that starts a function.
rdar://9852092
llvm-svn: 154120
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.
When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:
(icmp (add iv, stride), stride) --> (cmp iv, 0)
This forced us to use lea for the IC update, preventing the simpler
incl+cmp.
<rdar://problem/7643606>
<rdar://problem/11184260>
llvm-svn: 154119
|
| |
|
|
|
|
| |
testcase slightly less trivial. This fixes rdar://11171718.
llvm-svn: 154118
|
| |
|
|
|
|
| |
during instruction selection.
llvm-svn: 154113
|
| |
|
|
|
|
| |
some corner cases involving the PC register as an operand for these instructions.
llvm-svn: 154101
|
| |
|
|
| |
llvm-svn: 154100
|
| |
|
|
|
|
| |
modify it.
llvm-svn: 154098
|
| |
|
|
|
|
|
|
|
| |
of the BBVectorizePass without using command line option. As pointed out
by Hal, we can ask the TargetLoweringInfo for the architecture specific
VectorizeConfig to perform vectorizing with architecture specific
information.
llvm-svn: 154096
|
| |
|
|
|
|
|
| |
BasicBlock in other passes, e.g. we can call vectorizeBasicBlock in the
loop unroll pass right after the loop is unrolled.
llvm-svn: 154089
|
| |
|
|
|
|
| |
rdar://11189467
llvm-svn: 154087
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
the caller requested a null-terminated one.
When mapping the file there could be a racing issue that resulted in the file being larger
than the FileSize passed by the caller. We already have an assertion
for this in MemoryBuffer::init() but have a runtime guarantee that
the buffer will be null-terminated, so do a copy that adds a null-terminator.
Protects against crash of rdar://11161822.
llvm-svn: 154082
|
| |
|
|
|
|
|
|
| |
Plain 'cpsr' is an alias for 'cpsr_fc'.
rdar://11153753
llvm-svn: 154080
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LSR can fold three addressing modes into its ICmpZero node:
ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset
ICmpZero -1*ScaleReg + Offset => ICmp ScaleReg, Offset
ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg
The first two cases are only used if TLI->isLegalICmpImmediate() likes
the offset.
Make sure the right Offset sign is passed to this method in the second
case. The ARM version is not symmetric.
<rdar://problem/11184260>
llvm-svn: 154079
|
| |
|
|
| |
llvm-svn: 154062
|
| |
|
|
| |
llvm-svn: 154054
|
| |
|
|
|
|
| |
register indices on the source registers. No simple test case
llvm-svn: 154051
|
| |
|
|
|
|
| |
Still not fixed in the standard ;)
llvm-svn: 154044
|