| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
| |
missing barcelona CPU which that test uncovered, and remove the 32-bit
x86 CPUs which I really wasn't prepared to audit and test thoroughly.
If anyone wants to clean up the 32-bit only x86 CPUs, go for it.
Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd
be cool, but from the looks of it wouldn't save as much as it did for
the Intel CPUs.
llvm-svn: 223774
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Instructions of the form [ADD Rd, pc, #imm] are manually aliased
in processInstruction() to use ADR. To accomodate this, mod_imm handling
had to be tweaked a bit. Turns out it was the manual aliasing that must
be tweaked to accommodate mod_imms instead. More information about the
parsed instruction is available at the point where processInstruction()
is invoked, which makes it easier to detect a mod_imm at that point rather
than trying to detect a potential alias when a mod_imm is being prepped.
Added a test case and fixed some white spaces as well.
llvm-svn: 223772
|
| |
|
|
| |
llvm-svn: 223770
|
| |
|
|
| |
llvm-svn: 223768
|
| |
|
|
|
|
|
|
|
|
|
| |
Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll.
test54 and test57 were duplicates of each other.
test55 and test58 were duplicates of each other.
(Removed test57 and test58)
llvm-svn: 223767
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.
Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.
All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.
With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.
For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
integer loads and stores. SSA values are tremendously more powerful
than "copy" intrinsics. Not doing this regresses massive amounts of
LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
SROA or every memcpy of a trivially copyable struct will prevent SSA
formation of the members of that struct. It essentially turns off
SROA.
- The closest alternative is to actually split the loads and stores when
partitioning with SROA, but this has all of the downsides historically
discussed of splitting up loads and stores -- the wide-store
information is fundamentally lost. We would also see performance
regressions for bitfield-heavy code and other places where the
integers aren't really intended to be split without seemingly
arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
a choice to make IMO.
Differential Revision: http://reviews.llvm.org/D6548
llvm-svn: 223764
|
| |
|
|
|
|
|
|
|
|
|
| |
This handles the simplest case for mov -> push conversion:
1. x86-32 calling convention, everything is passed through the stack.
2. There is no reserved call frame.
3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.
Differential Revision: http://reviews.llvm.org/D6503
llvm-svn: 223757
|
| |
|
|
|
|
|
| |
The commit is identical except a reference to `GV' should have been to
`GVal'.
llvm-svn: 223756
|
| |
|
|
|
|
| |
This reverts commit r223754. I've upset the buildbots.
llvm-svn: 223755
|
| |
|
|
|
|
|
|
|
| |
Don't assume that the forward referenced entity was of the same
global-kind as the new entity.
This fixes PR21779.
llvm-svn: 223754
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.
Fixes PR21452.
llvm-svn: 223727
|
| |
|
|
|
|
|
|
| |
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.
llvm-svn: 223717
|
| |
|
|
|
|
| |
forms, mask, and vitpack instructions and patterns.
llvm-svn: 223710
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when
processing inline asm constraints. This had previously been implemented using a
non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but
the infrastructure does not seem to support this properly (neither the register
allocator nor the scheduler properly accounts for the alias). Instead, we can
just process this as a naming alias inside of the inline asm
constraint-processing code, so we'll do that instead.
There are two regression tests, one where the post-RA scheduler did the wrong
thing with the non-allocatable alias, and one where the register allocator did
the wrong thing. Fixes PR21742.
llvm-svn: 223708
|
| |
|
|
| |
llvm-svn: 223704
|
| |
|
|
| |
llvm-svn: 223702
|
| |
|
|
| |
llvm-svn: 223701
|
| |
|
|
| |
llvm-svn: 223693
|
| |
|
|
| |
llvm-svn: 223692
|
| |
|
|
|
|
|
| |
A zero sized array is zero sized and might share its address with
another global.
llvm-svn: 223684
|
| |
|
|
|
|
|
|
|
|
|
| |
We were already lazily linking functions, but all GlobalValues can be treated
uniformly for this.
The test updates are to ensure that a given GlobalValue is still linked in.
This fixes pr21494.
llvm-svn: 223681
|
| |
|
|
|
|
| |
shift patterns.
llvm-svn: 223680
|
| |
|
|
|
|
|
| |
This reverts r223624 with a small tweak, hopefully this will make stage3
equivalent.
llvm-svn: 223679
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fix a compact unwind encoding logic bug which would try to encode
more callee saved registers than it should, leading to early bail out
in the encoding logic and abusive use of DWARF frame mode unnecessarily.
Also remove no-compact-unwind.ll which was testing the wrong thing
based on this bug and move it to valid 'compact unwind' tests. Added
other few more tests too.
llvm-svn: 223676
|
| |
|
|
| |
llvm-svn: 223673
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.
The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.
Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.
Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.
llvm-svn: 223672
|
| |
|
|
| |
llvm-svn: 223669
|
| |
|
|
|
|
|
|
|
|
|
| |
Teach ISel how to match a TZCNT/LZCNT from a conditional move if the
condition code is X86_COND_NE.
Existing tablegen patterns only allowed to match TZCNT/LZCNT from a
X86cond with condition code equal to X86_COND_E. To avoid introducing
extra rules, I added an 'ImmLeaf' definition that checks if the
condition code is COND_E or COND_NE.
llvm-svn: 223668
|
| |
|
|
| |
llvm-svn: 223667
|
| |
|
|
|
|
|
| |
Since the main file was empty, we can just copy the content of the Input file
into it.
llvm-svn: 223666
|
| |
|
|
|
|
|
| |
This is just testing the largest merge mode for comdats. No need to use
hard to read names and fancy types.
llvm-svn: 223665
|
| |
|
|
| |
llvm-svn: 223664
|
| |
|
|
| |
llvm-svn: 223663
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, the backend sub-optimally expanded the non-constant shift
count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'.
With this patch the backend checks if the target features sse4.1. If so, then
it lets the shuffle legalizer deal with the expansion of the shift amount.
Example:
;;
define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) {
%shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer
%shl = shl <8 x i16> %A, %shamt
ret <8 x i16> %shl
}
;;
Before (with -mattr=+avx):
vmovd %xmm1, %eax
movzwl %ax, %eax
vmovd %eax, %xmm1
vpsllw %xmm1, %xmm0, %xmm0
retq
Now:
vpxor %xmm2, %xmm2, %xmm2
vpblendw $1, %xmm1, %xmm2, %xmm1
vpsllw %xmm1, %xmm0, %xmm0
retq
llvm-svn: 223660
|
| |
|
|
|
|
| |
It would crash when the function was lazy linked.
llvm-svn: 223656
|
| |
|
|
| |
llvm-svn: 223648
|
| |
|
|
|
|
| |
between stage2(gcc-clang) and stage3 clang. Investigating.
llvm-svn: 223624
|
| |
|
|
|
|
|
| |
As a fixup to r223616, follow the convention of naming the files after
the LLVM release whose bitcode they're maintaining compatability with.
llvm-svn: 223623
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add assembly and bitcode tests that I neglected to add in r223564 (IR:
Disallow complicated function-local metadata) and r223574 (IR: Disallow
function-local metadata attachments).
Found a couple of bugs:
- The error message for function-local attachments gave the wrong line
number -- it indicated the next token (typically on the next line)
instead of the token that started the attachment. Fixed.
- Metadata arguments of the form `!{i32 0, i32 %v}` (or with the
arguments reversed) fired an assertion in `ValueEnumerator` in LLVM
v3.5, so I suppose this never really worked. I suppose this was
"fixed" by r223564.
(Thanks to dblaikie for pointing out my omission.)
Part of PR21532.
llvm-svn: 223616
|
| |
|
|
|
|
|
|
| |
matching offsets. I don't expect this to really matter, but its what the
latest incarnation of my script for maintaining these tests happens to
produce, and so its simpler for me if everything matches.
llvm-svn: 223613
|
| |
|
|
|
|
|
| |
store to real pointers so that its clear that the right code is in fact
being generated.
llvm-svn: 223612
|
| |
|
|
|
|
| |
identical checks for different SSE variants into a single block.
llvm-svn: 223611
|
| |
|
|
|
|
|
| |
script. Notably this folds all the SSE cases together into a single
FileCheck block. It also adds a vex prefix.
llvm-svn: 223610
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Consider:
void f() {}
void __attribute__((weak)) g() {}
bool b = &f != &g;
It's possble for g to resolve to f if --defsym=g=f is passed on to the
linker.
llvm-svn: 223585
|
| |
|
|
|
|
| |
Code like X < Y && Y == 0 should always be folded away to false.
llvm-svn: 223583
|
| |
|
|
|
|
| |
This was changed in r223323.
llvm-svn: 223579
|
| |
|
|
|
|
|
|
| |
Metadata attachments to instructions cannot be function-local.
This is part of PR21532.
llvm-svn: 223574
|
| |
|
|
| |
llvm-svn: 223571
|
| |
|
|
|
|
| |
labels have a prefix "." for targeting i686-cygming.
llvm-svn: 223570
|
| |
|
|
|
|
|
|
| |
Most patterns will go away once the extload legalization changes land.
Differential Revision: http://reviews.llvm.org/D6125
llvm-svn: 223567
|