| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
doesn't have to guess.
llvm-svn: 103194
|
| |
|
|
| |
llvm-svn: 103193
|
| |
|
|
| |
llvm-svn: 102577
|
| |
|
|
|
|
|
|
|
|
|
| |
otherwise labels get incorrectly merged. We handled this by emitting a
".byte 0", but this isn't correct on thumb/arm targets where the text segment
needs to be a multiple of 2/4 bytes. Handle this by emitting a noop. This
is more gross than it should be because arm/ppc are not fully mc'ized yet.
This fixes rdar://7908505
llvm-svn: 102400
|
| |
|
|
| |
llvm-svn: 102326
|
| |
|
|
|
|
|
|
| |
and rename it to emitFrameIndexDebugValue.
- Teach spiller to modify DBG_VALUE instructions to reference spill slots.
llvm-svn: 102323
|
| |
|
|
| |
llvm-svn: 101334
|
| |
|
|
|
|
| |
instruction being optimized. There is no need to --I which can deref off start of the BB.
llvm-svn: 101162
|
| |
|
|
|
|
| |
in a nightly tester.
llvm-svn: 101158
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have this situation:
jCC L1
jmp L2
L1:
...
L2:
...
We can get a small performance boost by emitting this instead:
jnCC L2
L1:
...
L2:
...
This testcase shows an example of this:
float func(float x, float y) {
double product = (double)x * y;
if (product == 0.0)
return product;
return product - 1.0;
}
llvm-svn: 101075
|
| |
|
|
| |
llvm-svn: 100709
|
| |
|
|
|
|
| |
DBG_VALUE does not generate code.
llvm-svn: 100681
|
| |
|
|
|
|
|
| |
Operand 2 on a load instruction does not have to be a RegisterSDNode for this to
work.
llvm-svn: 100497
|
| |
|
|
| |
llvm-svn: 100214
|
| |
|
|
|
|
|
| |
folder to be tolerant of debug info following the
branch(es) at the end of a block.
llvm-svn: 100168
|
| |
|
|
| |
llvm-svn: 99975
|
| |
|
|
|
|
|
|
| |
SSEDomainFix will collapse to the domain with the lower number when it has a
choice. The SSEPackedSingle domain often has smaller instructions, so prefer
that.
llvm-svn: 99952
|
| |
|
|
|
|
|
| |
Rewrite the pmulld patterns, and make sure that they fold in loads of
arguments into the instruction.
llvm-svn: 99910
|
| |
|
|
|
|
| |
Cross-block inference is primitive and wrong, but the pass is working otherwise.
llvm-svn: 99848
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
crossings.
On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a register
in a different domain than where it was defined. Some instructions have
equvivalents for different domains, like por/orps/orpd.
The SSEDomainFix pass tries to minimize the number of domain crossings by
changing between equvivalent opcodes where possible.
This is a work in progress, in particular the pass doesn't do anything yet. SSE
instructions are tagged with their execution domain in TableGen using the last
two bits of TSFlags. Note that not all instructions are tagged correctly. Life
just isn't that simple.
The SSE execution domain issue is very similar to the ARM NEON/VFP pipeline
issue handled by NEONMoveFixPass. This pass may become target independent to
handle both.
llvm-svn: 99524
|
| |
|
|
|
|
|
|
| |
domain crossings."
This reverts commit 99345. It was breaking buildbots.
llvm-svn: 99352
|
| |
|
|
|
|
|
|
|
| |
crossings.
This is work in progress. So far, SSE execution domain tables are added to
X86InstrInfo, and a skeleton pass is enabled with -sse-domain-fix.
llvm-svn: 99345
|
| |
|
|
|
|
| |
MachineBasicBlock::iterator that does this automatically?
llvm-svn: 99320
|
| |
|
|
|
|
| |
support to allow loads to be folded to tail call instructions.
llvm-svn: 98465
|
| |
|
|
|
|
| |
large code models.
llvm-svn: 98042
|
| |
|
|
| |
llvm-svn: 97768
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code:
float floatingPointComparison(float x, float y) {
double product = (double)x * y;
if (product == 0.0)
return product;
return product - 1.0;
}
produces this:
_floatingPointComparison:
0000000000000000 cvtss2sd %xmm1,%xmm1
0000000000000004 cvtss2sd %xmm0,%xmm0
0000000000000008 mulsd %xmm1,%xmm0
000000000000000c pxor %xmm1,%xmm1
0000000000000010 ucomisd %xmm1,%xmm0
0000000000000014 jne 0x00000004
0000000000000016 jp 0x00000002
0000000000000018 jmp 0x00000008
000000000000001a addsd 0x00000006(%rip),%xmm0
0000000000000022 cvtsd2ss %xmm0,%xmm0
0000000000000026 ret
The "jne/jp/jmp" sequence can be reduced to this instead:
_floatingPointComparison:
0000000000000000 cvtss2sd %xmm1,%xmm1
0000000000000004 cvtss2sd %xmm0,%xmm0
0000000000000008 mulsd %xmm1,%xmm0
000000000000000c pxor %xmm1,%xmm1
0000000000000010 ucomisd %xmm1,%xmm0
0000000000000014 jp 0x00000002
0000000000000016 je 0x00000008
0000000000000018 addsd 0x00000006(%rip),%xmm0
0000000000000020 cvtsd2ss %xmm0,%xmm0
0000000000000024 ret
for a savings of 2 bytes.
This xform can happen when we recognize that jne and jp jump to the same "true"
MBB, the unconditional jump would jump to the "false" MBB, and the "true" branch
is the fall-through MBB.
llvm-svn: 97766
|
| |
|
|
|
|
|
|
|
|
|
| |
Extracting the low element of a vector is now done with EXTRACT_SUBREG,
and the zero-extension performed by load movss is now modeled with
SUBREG_TO_REG, and so on.
Register-to-register movss and movsd are no longer considered copies;
they are two-address instructions which insert a scalar into a vector.
llvm-svn: 97354
|
| |
|
|
| |
llvm-svn: 97227
|
| |
|
|
| |
llvm-svn: 96778
|
| |
|
|
|
|
|
| |
This will work better for the disassembler for modeling things
like lfence/monitor/vmcall etc.
llvm-svn: 95960
|
| |
|
|
|
|
|
| |
use a multipattern that generates both the 1-byte and 4-byte
versions from the same defm
llvm-svn: 95901
|
| |
|
|
|
|
|
|
|
| |
into TargetOpcodes.h. #include the new TargetOpcodes.h
into MachineInstr. Add new inline accessors (like isPHI())
to MachineInstr, and start using them throughout the
codebase.
llvm-svn: 95687
|
| |
|
|
| |
llvm-svn: 95440
|
| |
|
|
| |
llvm-svn: 95410
|
| |
|
|
|
|
| |
TSFlags directly instead of a TargetInstrDesc.
llvm-svn: 95405
|
| |
|
|
| |
llvm-svn: 94477
|
| |
|
|
| |
llvm-svn: 94254
|
| |
|
|
|
|
| |
scheduled together.
llvm-svn: 94147
|
| |
|
|
|
|
| |
unaligned load so it doesn't require 16-byte alignment.
llvm-svn: 94058
|
| |
|
|
| |
llvm-svn: 94032
|
| |
|
|
|
|
|
| |
more cases where debug declarations affect
debug line info.
llvm-svn: 93953
|
| |
|
|
|
|
|
|
|
|
|
| |
function can support dynamic stack realignment. That's a much easier question
to answer at instruction selection stage than whether the function actually
will have dynamic alignment prologue. This allows the removal of the
stack alignment heuristic pass, and improves code quality for cases where
the heuristic would result in dynamic alignment code being generated when
it was not strictly necessary.
llvm-svn: 93885
|
| |
|
|
|
|
| |
32-bit.
llvm-svn: 93307
|
| |
|
|
|
|
|
|
| |
where the pre-extension values are available in the subreg of the result of the extension, replace the uses of the pre-extension value with the result + extract_subreg.
For now, this pass is fairly conservative. It only perform the replacement when both the pre- and post- extension values are used in the block. It will miss cases where the post-extension values are live, but not used.
llvm-svn: 93278
|
| |
|
|
| |
llvm-svn: 93229
|
| |
|
|
|
|
|
|
|
| |
instruction is copy like where the source and destination registers can
overlap. This is to be used by the coalescable to coalesce the source and
destination registers of instructions like X86::MOVSX64rr32. Apparently
some crazy people believe the coalescer is too simple.
llvm-svn: 93210
|
| |
|
|
| |
llvm-svn: 93185
|
| |
|
|
|
|
|
|
| |
new AsmPrinter. This is perhaps less elegant than describing them
in terms of MOV32r0 and subreg operations, but it allows the
current register to rematerialize them.
llvm-svn: 93158
|
| |
|
|
| |
llvm-svn: 92653
|