| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
to move the stack pointer in prologs/epilogs. I will fix the test and add it
back later.
llvm-svn: 174484
|
|
|
|
|
|
|
|
|
| |
Failure: undefined symbol 'Lline_table_start0'.
Root-cause: we use a symbol subtraction to calculate at_stmt_list, but
the line table entries are not dumped in the assembly.
Fix: use zero instead of a symbol subtraction for Compile Unit 0.
llvm-svn: 174479
|
|
|
|
| |
llvm-svn: 174464
|
|
|
|
|
|
|
|
|
| |
We generate one line table for each compilation unit in the object file.
Reviewed by Eric and Kevin.
rdar://problem/13067005
llvm-svn: 174445
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
differentiate between the alignment of the
base point of a load, and the overall alignment of the load. This caused infinite loops in DAG combine with the
original application of this patch.
ORIGINAL COMMIT LOG:
When the target-independent DAGCombiner inferred a higher alignment for a load,
it would replace the load with one with the higher alignment. However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.
This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.
llvm-svn: 174431
|
|
|
|
|
|
| |
This was fixed by r174402.
llvm-svn: 174405
|
|
|
|
|
|
|
|
| |
alignment for a load,"
It caused hangups in compiling clang/lib/Parse/ParseDecl.cpp and clang/lib/Driver/Tools.cpp in stage2 on some hosts.
llvm-svn: 174374
|
|
|
|
|
|
|
|
|
|
|
| |
it would replace the load with one with the higher alignment. However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.
This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.
llvm-svn: 174343
|
|
|
|
|
|
|
|
| |
lowering.
Fixes PR15141.
llvm-svn: 174327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This required disabling a PowerPC optimization that did the following:
input:
x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16>
lowered to:
tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8>
x = ADD tmp, tmp
The add now gets folded immediately and we're back at the BUILD_VECTOR we
started from. I don't see a way to fix this currently so I left it disabled
for now.
Fix some trivially foldable X86 tests too.
llvm-svn: 174325
|
|
|
|
|
|
|
|
|
|
| |
The main lists of debug info metadata attached to the compile_unit had an extra
layer of metadata nodes they went through for no apparent reason. This patch
removes that (& still passes just as much of the GDB 7.5 test suite). If anyone
can show evidence as to why these extra metadata nodes are there I'm open to
reverting this patch & documenting why they're there.
llvm-svn: 174266
|
|
|
|
|
|
|
| |
Fix a bug in DAGCombine. The symptom is mistakenly optimizing expression
"x + x*x" into "x * 3.0".
llvm-svn: 174239
|
|
|
|
|
|
|
|
|
| |
1) allows the use of RIP-relative addressing in 32-bit LEA instructions under
x86-64 (ILP32 and LP64)
2) separates the size of address registers in 64-bit LEA instructions from
control by ILP32/LP64.
llvm-svn: 174208
|
|
|
|
|
|
| |
past the natural stack alignment.
llvm-svn: 174085
|
|
|
|
|
|
|
| |
register for inline asm. This conforms to how gcc allows for effective
casting of inputs into gprs (fprs is already handled).
llvm-svn: 174008
|
|
|
|
| |
llvm-svn: 174006
|
|
|
|
| |
llvm-svn: 173997
|
|
|
|
| |
llvm-svn: 173988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
right shift amount type.
Fix that by adding a cast to the shift expander. This came up with vector shifts
on sse-less X86 CPUs.
<2 x i64> = shl <2 x i64> <2 x i64>
-> i64,i64 = shl i64 i64; shl i64 i64
-> i32,i32,i32,i32 = shl_parts i32 i32 i64; shl_parts i32 i32 i64
Now we cast the last two i64s to the right type. Fixes the crash in PR14668.
llvm-svn: 173615
|
|
|
|
|
|
|
|
|
|
|
|
| |
This catches many cases where we can emit a more efficient shuffle for a
specific mask or when the mask contains undefs. Once the splat is lowered to
unpacks we can't do that anymore.
There is a possibility of moving the promotion after pshufb matching, but I'm
not sure if pshufb with a mask loaded from memory is faster than 3 shuffles, so
I avoided that for now.
llvm-svn: 173569
|
|
|
|
| |
llvm-svn: 173568
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.
There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).
A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.
llvm-svn: 173503
|
|
|
|
|
|
|
| |
use them in tests that run llvm-dwarfdump. This is in order to make tests as
specific as possible.
llvm-svn: 173498
|
|
|
|
|
|
|
| |
Allow the strategy to select SchedDFS. Allow the results of SchedDFS
to affect initialization of the scheduler state.
llvm-svn: 173425
|
|
|
|
|
|
| |
interface and allow other strategies to select it.
llvm-svn: 173413
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The requirements of the strong heuristic are:
* A Protector is required for functions which contain an array, regardless of
type or length.
* A Protector is required for functions which contain a structure/union which
contains an array, regardless of type or length. Note, there is no limit to
the depth of nesting.
* A protector is required when the address of a local variable (i.e., stack
based variable) is exposed. (E.g., such as through a local whose address is
taken as part of the RHS of an assignment or a local whose address is taken as
part of a function argument.)
llvm-svn: 173231
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SSPStrong applies a heuristic to insert stack protectors in these situations:
* A Protector is required for functions which contain an array, regardless of
type or length.
* A Protector is required for functions which contain a structure/union which
contains an array, regardless of type or length. Note, there is no limit to
the depth of nesting.
* A protector is required when the address of a local variable (i.e., stack
based variable) is exposed. (E.g., such as through a local whose address is
taken as part of the RHS of an assignment or a local whose address is taken as
part of a function argument.)
This patch implements the SSPString attribute to be equivalent to
SSPRequired. This will change in a subsequent patch.
llvm-svn: 173230
|
|
|
|
|
|
|
|
|
|
| |
- Add list of physical registers clobbered in pseudo atomic insts
Physical registers are clobbered when pseudo atomic instructions are
expanded. Add them in clobber list to prevent DAG scheduler to
mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 173200
|
|
|
|
|
|
|
| |
On valgrind the processor is reported;
Host CPU: athlon-fx
llvm-svn: 172983
|
|
|
|
|
|
|
|
|
| |
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.
llvm-svn: 172968
|
|
|
|
| |
llvm-svn: 172894
|
|
|
|
|
|
| |
vinsertf128) is faster than using a single vmovups instruction.
llvm-svn: 172868
|
|
|
|
|
|
|
|
| |
It sometimes causes spurious failure on lit win32.
Feel free to prune or suppress each output.
llvm-svn: 172823
|
|
|
|
|
|
|
|
|
|
|
|
| |
v8i8 -> v8i64,
v8i8 -> v8i32,
v4i8 -> v4i64,
v4i16 -> v4i64
for AVX and AVX2.
Bug 14865.
llvm-svn: 172708
|
|
|
|
|
|
|
| |
Those can occur when something between the sextload and the store is on the same
chain and blocks isel. Fixes PR14887.
llvm-svn: 172353
|
|
|
|
|
|
|
|
|
| |
Adds a check for -Oz, changes the code to not re-visit BBs,
and skips over DBG_VALUE instrs.
Patch by Andy Zhang.
llvm-svn: 172258
|
|
|
|
|
|
|
| |
This removes previous special cases for each floating-point type in favour of a
shared codepath.
llvm-svn: 172189
|
|
|
|
|
|
| |
underscore in symbol on linux.
llvm-svn: 172139
|
|
|
|
|
|
| |
than the string size.
llvm-svn: 172124
|
|
|
|
|
|
| |
Part of rdar://12991541
llvm-svn: 172121
|
|
|
|
|
|
|
|
|
| |
It cahced XOR's operands before calling visitXOR() but failed to update the
operands when visitXOR changed the XOR node.
rdar://12968664
llvm-svn: 171999
|
|
|
|
| |
llvm-svn: 171956
|
|
|
|
|
|
|
|
|
|
|
| |
two constant.
PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.
Patch by Zvi Rackover.
llvm-svn: 171953
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.
When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.
It includes tests.
This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces
Patch by Andy Zhang.
llvm-svn: 171879
|
|
|
|
|
|
|
|
|
|
|
| |
cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior.
cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix.
cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix.
Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein.
llvm-svn: 171668
|
|
|
|
|
|
| |
Thanks to Nick Lewycky for the initial patch.
llvm-svn: 171665
|
|
|
|
|
|
| |
hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks.
llvm-svn: 171608
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171603
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171524
|
|
|
|
|
|
|
|
|
| |
tests fail. Original message:
Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.
llvm-svn: 171468
|