summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* Remove this test in the meantime, since it won't pass on Atom. Atom uses leaEli Bendersky2013-02-061-22/+0
| | | | | | | to move the stack pointer in prologs/epilogs. I will fix the test and add it back later. llvm-svn: 174484
* Attempt to recover gdb bot after r174445.Manman Ren2013-02-062-2/+2
| | | | | | | | | Failure: undefined symbol 'Lline_table_start0'. Root-cause: we use a symbol subtraction to calculate at_stmt_list, but the line table entries are not dumped in the assembly. Fix: use zero instead of a symbol subtraction for Compile Unit 0. llvm-svn: 174479
* Test for r174446Eli Bendersky2013-02-051-0/+22
| | | | llvm-svn: 174464
* Dwarf: support for LTO where a single object file can have multiple line tablesManman Ren2013-02-053-3/+3
| | | | | | | | | We generate one line table for each compilation unit in the object file. Reviewed by Eric and Kevin. rdar://problem/13067005 llvm-svn: 174445
* Reapply r174343, with a fix for a scary DAG combine bug where it failed to ↵Owen Anderson2013-02-051-3/+3
| | | | | | | | | | | | | | | | | | differentiate between the alignment of the base point of a load, and the overall alignment of the load. This caused infinite loops in DAG combine with the original application of this patch. ORIGINAL COMMIT LOG: When the target-independent DAGCombiner inferred a higher alignment for a load, it would replace the load with one with the higher alignment. However, it did not place the new load in the worklist, which prevented later DAG combines in the same phase (for example, target-specific combines) from ever seeing it. This patch corrects that oversight, and updates some tests whose output changed due to slightly different DAGCombine outputs. llvm-svn: 174431
* Add a test case for PR14750.Jakob Stoklund Olesen2013-02-051-2/+40
| | | | | | This was fixed by r174402. llvm-svn: 174405
* Revert r174343, "When the target-independent DAGCombiner inferred a higher ↵NAKAMURA Takumi2013-02-051-3/+3
| | | | | | | | alignment for a load," It caused hangups in compiling clang/lib/Parse/ParseDecl.cpp and clang/lib/Driver/Tools.cpp in stage2 on some hosts. llvm-svn: 174374
* When the target-independent DAGCombiner inferred a higher alignment for a load,Owen Anderson2013-02-051-3/+3
| | | | | | | | | | | it would replace the load with one with the higher alignment. However, it did not place the new load in the worklist, which prevented later DAG combines in the same phase (for example, target-specific combines) from ever seeing it. This patch corrects that oversight, and updates some tests whose output changed due to slightly different DAGCombine outputs. llvm-svn: 174343
* X86: Open up some opportunities for constant folding by postponing shift ↵Benjamin Kramer2013-02-041-0/+10
| | | | | | | | lowering. Fixes PR15141. llvm-svn: 174327
* SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors.Benjamin Kramer2013-02-043-7/+6
| | | | | | | | | | | | | | | | | This required disabling a PowerPC optimization that did the following: input: x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16> lowered to: tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8> x = ADD tmp, tmp The add now gets folded immediately and we're back at the BUILD_VECTOR we started from. I don't see a way to fix this currently so I left it disabled for now. Fix some trivially foldable X86 tests too. llvm-svn: 174325
* Remove the (apparently) unnecessary debug info metadata indirection.David Blaikie2013-02-022-10/+5
| | | | | | | | | | The main lists of debug info metadata attached to the compile_unit had an extra layer of metadata nodes they went through for no apparent reason. This patch removes that (& still passes just as much of the GDB 7.5 test suite). If anyone can show evidence as to why these extra metadata nodes are there I'm open to reverting this patch & documenting why they're there. llvm-svn: 174266
* rdar://13126763Shuxin Yang2013-02-021-0/+42
| | | | | | | Fix a bug in DAGCombine. The symptom is mistakenly optimizing expression "x + x*x" into "x * 3.0". llvm-svn: 174239
* Two changes relevant to LEA and x32:David Sehr2013-02-012-0/+41
| | | | | | | | | 1) allows the use of RIP-relative addressing in 32-bit LEA instructions under x86-64 (ILP32 and LP64) 2) separates the size of address registers in 64-bit LEA instructions from control by ILP32/LP64. llvm-svn: 174208
* When lowering memcpys to loads and stores, make sure we don't promote alignmentsLang Hames2013-01-311-25/+52
| | | | | | past the natural stack alignment. llvm-svn: 174085
* Check and allow floating point registers to select the size of theEric Christopher2013-01-311-0/+42
| | | | | | | register for inline asm. This conforms to how gcc allows for effective casting of inputs into gprs (fprs is already handled). llvm-svn: 174008
* Replace some more greps with FileChecks in testsEli Bendersky2013-01-313-22/+30
| | | | llvm-svn: 174006
* Rewrite this test properly with a FileCheck instead of grepsEli Bendersky2013-01-311-8/+10
| | | | llvm-svn: 173997
* Forgot the test case before.Evan Cheng2013-01-301-0/+40
| | | | llvm-svn: 173988
* When the legalizer is splitting vector shifts, the result may not have the ↵Benjamin Kramer2013-01-271-0/+11
| | | | | | | | | | | | | | | right shift amount type. Fix that by adding a cast to the shift expander. This came up with vector shifts on sse-less X86 CPUs. <2 x i64> = shl <2 x i64> <2 x i64> -> i64,i64 = shl i64 i64; shl i64 i64 -> i32,i32,i32,i32 = shl_parts i32 i32 i64; shl_parts i32 i32 i64 Now we cast the last two i64s to the right type. Fixes the crash in PR14668. llvm-svn: 173615
* X86: Do splat promotion later, so the optimizer can chew on it first.Benjamin Kramer2013-01-262-17/+7
| | | | | | | | | | | | This catches many cases where we can emit a more efficient shuffle for a specific mask or when the mask contains undefs. Once the splat is lowered to unpacks we can't do that anymore. There is a possibility of moving the promotion after pshufb matching, but I'm not sure if pshufb with a mask loaded from memory is faster than 3 shuffles, so I avoided that for now. llvm-svn: 173569
* FileCheckize and merge some tests.Benjamin Kramer2013-01-264-127/+227
| | | | llvm-svn: 173568
* In this patch, we teach X86_64TargetMachine that it has a ILP32Eli Bendersky2013-01-252-4/+43
| | | | | | | | | | | | | | | | | | | | | (defined by the x32 ABI) mode, in which case its pointers are 32-bits in size. This knowledge is also added to X86RegisterInfo that now returns the appropriate registers in getPointerRegClass. There are many outcomes to this change. In order to keep the patches separate and manageable, we start by focusing on some simple testable cases. The patch adds a test with passing a pointer to a function - focusing on the difference between the two data models for x86-64. Another test is added for handling of 'sret' arguments (and functionality is added in X86ISelLowering to make it work). A note on naming: the "x32 ABI" document refers to the AMD64 architecture (in LLVM it's distinguished by being is64Bits() in the x86 subtarget) with two variations: the LP64 (default) data model, and the ILP32 data model. This patch adds predicates to the subtarget which are consistent with this naming scheme. llvm-svn: 173503
* Now that llvm-dwarfdump supports flags to specify which DWARF section to dump,Eli Bendersky2013-01-251-1/+1
| | | | | | | use them in tests that run llvm-dwarfdump. This is in order to make tests as specific as possible. llvm-svn: 173498
* MIsched: Improve the interface to SchedDFS analysis (subtrees).Andrew Trick2013-01-251-3/+0
| | | | | | | Allow the strategy to select SchedDFS. Allow the results of SchedDFS to affect initialization of the scheduler state. llvm-svn: 173425
* MISched: Add SchedDFSResult to ScheduleDAGMI to formalize theAndrew Trick2013-01-251-0/+3
| | | | | | interface and allow other strategies to select it. llvm-svn: 173413
* Add the heuristic to differentiate SSPStrong from SSPRequired.Bill Wendling2013-01-231-12/+2518
| | | | | | | | | | | | | | | | | | The requirements of the strong heuristic are: * A Protector is required for functions which contain an array, regardless of type or length. * A Protector is required for functions which contain a structure/union which contains an array, regardless of type or length. Note, there is no limit to the depth of nesting. * A protector is required when the address of a local variable (i.e., stack based variable) is exposed. (E.g., such as through a local whose address is taken as part of the RHS of an assignment or a local whose address is taken as part of a function argument.) llvm-svn: 173231
* Add the IR attribute 'sspstrong'.Bill Wendling2013-01-231-21/+628
| | | | | | | | | | | | | | | | | | | | | SSPStrong applies a heuristic to insert stack protectors in these situations: * A Protector is required for functions which contain an array, regardless of type or length. * A Protector is required for functions which contain a structure/union which contains an array, regardless of type or length. Note, there is no limit to the depth of nesting. * A protector is required when the address of a local variable (i.e., stack based variable) is exposed. (E.g., such as through a local whose address is taken as part of the RHS of an assignment or a local whose address is taken as part of a function argument.) This patch implements the SSPString attribute to be equivalent to SSPRequired. This will change in a subsequent patch. llvm-svn: 173230
* Fix an issue of pseudo atomic instruction DAG scheduleMichael Liao2013-01-221-0/+110
| | | | | | | | | | - Add list of physical registers clobbered in pseudo atomic insts Physical registers are clobbered when pseudo atomic instructions are expanded. Add them in clobber list to prevent DAG scheduler to mis-schedule them after these insns are declared side-effect free. - Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 173200
* llvm/test/CodeGen/X86/win_ftol2.ll: Add -cpu=generic to appease valgrind.NAKAMURA Takumi2013-01-201-1/+1
| | | | | | | On valgrind the processor is reported; Host CPU: athlon-fx llvm-svn: 172983
* Revert 172708.Nadav Rotem2013-01-202-68/+0
| | | | | | | | | The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends. This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical. Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model. llvm-svn: 172968
* On Sandybridge split unaligned 256bit stores into two xmm-sized stores. Nadav Rotem2013-01-198-27/+38
| | | | llvm-svn: 172894
* On Sandybridge loading unaligned 256bits using two XMM loads (vmovups and ↵Nadav Rotem2013-01-182-1/+22
| | | | | | vinsertf128) is faster than using a single vmovups instruction. llvm-svn: 172868
* llvm/test/CodeGen/X86/Atomics-64.ll: Tweak for 2nd RUN not to overwrite %t. ↵NAKAMURA Takumi2013-01-181-2/+2
| | | | | | | | It sometimes causes spurious failure on lit win32. Feel free to prune or suppress each output. llvm-svn: 172823
* Optimization for the following SIGN_EXTEND pairs:Elena Demikhovsky2013-01-172-0/+80
| | | | | | | | | | | | v8i8 -> v8i64, v8i8 -> v8i32, v4i8 -> v4i64, v4i16 -> v4i64 for AVX and AVX2. Bug 14865. llvm-svn: 172708
* X86: Add patterns for X86ISD::VSEXT in registers.Benjamin Kramer2013-01-131-0/+176
| | | | | | | Those can occur when something between the sextload and the store is on the same chain and blocks isel. Fixes PR14887. llvm-svn: 172353
* Update patch for the pad short functions pass for Intel Atom (only).Preston Gurd2013-01-111-0/+25
| | | | | | | | | Adds a check for -Oz, changes the code to not re-visit BBs, and skips over DBG_VALUE instrs. Patch by Andy Zhang. llvm-svn: 172258
* Simplify writing floating types to assembly.Tim Northover2013-01-111-0/+40
| | | | | | | This removes previous special cases for each floating-point type in favour of a shared codepath. llvm-svn: 172189
* llvm/test/CodeGen/X86/ms-inline-asm.ll: Fixup; Globals doesn't have leading ↵NAKAMURA Takumi2013-01-101-2/+2
| | | | | | underscore in symbol on linux. llvm-svn: 172139
* PR14896: Handle memcpy from constant string where the memcpy size is larger ↵Evan Cheng2013-01-101-0/+13
| | | | | | than the string size. llvm-svn: 172124
* [ms-inline asm] Add support for calling functions from inline assembly.Chad Rosier2013-01-101-0/+18
| | | | | | Part of rdar://12991541 llvm-svn: 172121
* Fix a DAG combine bug visitBRCOND() is transforming br(xor(x, y)) to br(x != y).Evan Cheng2013-01-091-0/+41
| | | | | | | | | It cahced XOR's operands before calling visitXOR() but failed to update the operands when visitXOR changed the XOR node. rdar://12968664 llvm-svn: 171999
* add -march to the testNadav Rotem2013-01-091-1/+1
| | | | llvm-svn: 171956
* Efficient lowering of vector sdiv when the divisor is a splatted power of ↵Nadav Rotem2013-01-091-0/+72
| | | | | | | | | | | two constant. PR 14848. The lowered sequence is based on the existing sequence the target-independent DAG Combiner creates for the scalar case. Patch by Zvi Rackover. llvm-svn: 171953
* Pad Short Functions for Intel AtomPreston Gurd2013-01-084-5/+83
| | | | | | | | | | | | | | | | | | | | | | | | The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. This patch has been updated to address Nadav's review comments - Optimize only at >= O1 and don't do optimization if -Os is set - Stores MachineBasicBlock* instead of BBNum - Uses DenseMap instead of std::map - Fixes placement of braces Patch by Andy Zhang. llvm-svn: 171879
* Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, ↵Craig Topper2013-01-063-5/+5
| | | | | | | | | | | cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior. cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix. cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix. Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein. llvm-svn: 171668
* Fix for PR14739. It's not safe to fold a load into a call across a store. ↵Evan Cheng2013-01-061-4/+21
| | | | | | Thanks to Nick Lewycky for the initial patch. llvm-svn: 171665
* Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions ↵Craig Topper2013-01-051-0/+31
| | | | | | hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171608
* Revert revision 171524. Original message:Nadav Rotem2013-01-054-76/+5
| | | | | | | | | | | | | | | | | | | | URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev Log: The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171603
* The current Intel Atom microarchitecture has a feature whereby when a functionPreston Gurd2013-01-044-5/+76
| | | | | | | | | | | | | | | | | returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171524
* Revert revision: 171467. This transformation is incorrect and makes some ↵Nadav Rotem2013-01-041-15/+0
| | | | | | | | | tests fail. Original message: Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171468
OpenPOWER on IntegriCloud