| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
%al/ax/eax/rax with memory offset.
llvm-svn: 225078
|
|
|
|
|
|
|
|
|
|
| |
without using mode predicates.
This is necessary to allow the disassembler to be able to handle AdSize32 instructions in 64-bit mode when address size prefix is used.
Eventually we should probably also support 'addr32' and 'addr16' in the assembler to override the address size on some of these instructions. But for now we'll just use special operand types that will lookup the current mode size to select the right instruction.
llvm-svn: 225075
|
|
|
|
|
|
|
| |
Attempting to fix PR22078 (building on 32-bit systems) by replacing my careless
use of 1ul to be a uint64_t constant with UINT64_C(1).
llvm-svn: 225066
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second installment of improvements to instruction selection for "bit
permutation" instruction sequences. r224318 added logic for instruction
selection for 32-bit bit permutation sequences, and this adds lowering for
64-bit sequences. The 64-bit sequences are more complicated than the 32-bit
ones because:
a) the 64-bit versions of the 32-bit rotate-and-mask instructions
work by replicating the lower 32-bits of the value-to-be-rotated into the
upper 32 bits -- and integrating this into the cost modeling for the various
bit group operations is non-trivial
b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions
cannot, in one instruction, specify the
mask starting index, the mask ending index, and the rotation factor. Also,
forming arbitrary 64-bit constants is more complicated than in 32-bit mode
because the number of instructions necessary is value dependent.
Plus, support for 'late masking' was added: it is sometimes more efficient to
treat the overall value as if it had no mandatory zero bits when planning the
bit-group insertions, and then mask them in at the very end. Unfortunately, as
the structure of the bit groups is different in the two cases, the more
feasible implementation technique was to generate both instruction sequences,
and then pick the shorter one.
And finally, we now generate reasonable code for i64 bswap:
rldicl 5, 3, 16, 0
rldicl 4, 3, 8, 0
rldicl 6, 3, 24, 0
rldimi 4, 5, 8, 48
rldicl 5, 3, 32, 0
rldimi 4, 6, 16, 40
rldicl 6, 3, 48, 0
rldimi 4, 5, 24, 32
rldicl 5, 3, 56, 0
rldimi 4, 6, 40, 16
rldimi 4, 5, 48, 8
rldimi 4, 3, 56, 0
vs. what we used to produce:
li 4, 255
rldicl 5, 3, 24, 40
rldicl 6, 3, 40, 24
rldicl 7, 3, 56, 8
sldi 8, 3, 8
sldi 10, 3, 24
sldi 12, 3, 40
rldicl 0, 3, 8, 56
sldi 9, 4, 32
sldi 11, 4, 40
sldi 4, 4, 48
andi. 5, 5, 65280
andis. 6, 6, 255
andis. 7, 7, 65280
sldi 3, 3, 56
and 8, 8, 9
and 4, 12, 4
and 9, 10, 11
or 6, 7, 6
or 5, 5, 0
or 3, 3, 4
or 7, 9, 8
or 4, 6, 5
or 3, 3, 7
or 3, 3, 4
which is 12 instructions, instead of 25, and seems optimal (at least in terms
of code size).
llvm-svn: 225056
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The issues was that AArch64 has additional restrictions on when local
relocations can be used. We have to take those into consideration when
deciding to put a L symbol in the symbol table or not.
Original message:
Remove doesSectionRequireSymbols.
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
llvm-svn: 225048
|
|
|
|
| |
llvm-svn: 225047
|
|
|
|
|
|
| |
problem
llvm-svn: 225045
|
|
|
|
|
|
|
|
| |
This reverts commit r224985.
I am investigating why it made an Apple bot unhappy.
llvm-svn: 225044
|
|
|
|
|
|
| |
Relocations aren't implemented yet but we don't need to abort for this in release builds.
llvm-svn: 225043
|
|
|
|
|
|
| |
modes with all 4 combinations of OpSize and AdSize prefixes being present or not.
llvm-svn: 225036
|
|
|
|
| |
llvm-svn: 225035
|
|
|
|
|
|
| |
doubleword bitfield extract, word parity, accumulating multiplies with saturation.
llvm-svn: 225024
|
|
|
|
| |
llvm-svn: 225018
|
|
|
|
| |
llvm-svn: 225015
|
|
|
|
| |
llvm-svn: 225010
|
|
|
|
|
|
| |
immediate newvalue stores.
llvm-svn: 225009
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D6796
llvm-svn: 225008
|
|
|
|
| |
llvm-svn: 225007
|
|
|
|
| |
llvm-svn: 225006
|
|
|
|
| |
llvm-svn: 225005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under the large code model, we cannot assume that __morestack lives within
2^31 bytes of the call site, so we cannot use pc-relative addressing. We
cannot perform the call via a temporary register, as the rax register may
be used to store the static chain, and all other suitable registers may be
either callee-save or used for parameter passing. We cannot use the stack
at this point either because __morestack manipulates the stack directly.
To avoid these issues, perform an indirect call via a read-only memory
location containing the address.
This solution is not perfect, as it assumes that the .rodata section
is laid out within 2^31 bytes of each function body, but this seems to
be sufficient for JIT.
Differential Revision: http://reviews.llvm.org/D6787
llvm-svn: 225003
|
|
|
|
| |
llvm-svn: 224997
|
|
|
|
| |
llvm-svn: 224992
|
|
|
|
|
|
| |
compare to general register reg-imm form.
llvm-svn: 224991
|
|
|
|
|
|
| |
compare to general register, and inverted compares.
llvm-svn: 224989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
llvm-svn: 224985
|
|
|
|
|
|
| |
post-increment circular register stores, and bit reversed post-increment stores.
llvm-svn: 224957
|
|
|
|
| |
llvm-svn: 224955
|
|
|
|
|
|
| |
form stores with tests.
llvm-svn: 224952
|
|
|
|
|
|
| |
have encoding bits.
llvm-svn: 224951
|
|
|
|
|
|
| |
classes and instruction defs.
llvm-svn: 224949
|
|
|
|
|
|
| |
convertible to three address instructions, but aren't really.
llvm-svn: 224940
|
|
|
|
|
|
| |
functionality to the 0x80 opcode instructions, but are not valid in 64-bit mode.
llvm-svn: 224939
|
|
|
|
|
|
| |
for another change. NFC.
llvm-svn: 224938
|
|
|
|
| |
llvm-svn: 224937
|
|
|
|
|
|
| |
Patch by Michael Neumann.
llvm-svn: 224936
|
|
|
|
|
|
| |
No intended functionality change.
llvm-svn: 224935
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The else case ResultReg was not checked for validity.
To my surprise, this case was not hit in any of the
existing test cases. This includes a new test cases
that tests this path.
Also drop the `target triple` declaration from the
original test as suggested by H.J. Lu, because
apparently with it the test won't be run on Linux
llvm-svn: 224901
|
|
|
|
|
|
|
|
| |
Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246)
Differential Revision: http://reviews.llvm.org/D6780
llvm-svn: 224900
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
@llvm.cttz/ctlz.
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.
Example:
\code
entry:
%cmp = icmp eq i64 %val, 0
br i1 %cmp, label %end.bb, label %then.bb
then.bb:
%c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
br label %end.bb
end.bb:
%cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code
In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.
With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.
Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.
Differential Revision: http://reviews.llvm.org/D6728
llvm-svn: 224899
|
|
|
|
|
|
| |
with illegal immediates. Correctly this time. I did the wrong patterns the first time.
llvm-svn: 224891
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determining the address of a TLS variable results in a function call in
certain TLS models. This means that a simple ICmpInst might actually
result in invalidating the CTR register.
In such cases, do not attempt to rely on the CTR register for loop
optimization purposes.
This fixes PR22034.
Differential Revision: http://reviews.llvm.org/D6786
llvm-svn: 224890
|
|
|
|
|
|
| |
without asserts. NFC.
llvm-svn: 224889
|
|
|
|
|
|
| |
-Wunused-but-set-variable warning; NFC.
llvm-svn: 224888
|
|
|
|
|
|
| |
with illegal immediates. Forgot to do this when I did SSE/SSE2/AVX/AVX2.
llvm-svn: 224887
|
|
|
|
|
|
| |
cmp.ps/pd/ss/sd instead of truncating the immediate. The assembly parser and instruction selection shouldn't generate invalid immediates.
llvm-svn: 224886
|
|
|
|
|
|
| |
immediates. The frontend now checks this when the builtin is used. This will allow the instruction printer to not have to deal with invalid immediates on these instructions.
llvm-svn: 224885
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Consider the following IR:
%3 = load i8* undef
%4 = trunc i8 %3 to i1
%5 = call %jl_value_t.0* @foo(..., i1 %4, ...)
ret %jl_value_t.0* %5
Bools (that are the result of direct truncs) are lowered as whatever
the argument to the trunc was and a "and 1", causing the part of the
MBB responsible for this argument to look something like this:
%vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7
Later, when the load is lowered, it will insert
%vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14
but remember to (at the end of isel) replace vreg7 by vreg15. Now for
the bug. In fast isel lowering, we mistakenly mark vreg8 as the result
of the load instead of the trunc. This adds a fixup to have
vreg8 replaced by whatever the result of the load is as well, so
we end up with
%vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15
which is an SSA violation and causes problems later down the road.
This fixes PR21557.
Test Plan: Test test case from PR21557 is added to the test suite.
Reviewers: ributzka
Reviewed By: ributzka
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6245
llvm-svn: 224884
|
|
|
|
| |
llvm-svn: 224871
|
|
|
|
| |
llvm-svn: 224870
|