summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Bring r226038 back.Rafael Espindola2015-01-197-46/+2
| | | | | | | | | | | | | | | | No change in this commit, but clang was changed to also produce trivial comdats when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226467
* [MIScheduler] Slightly better handling of constrainLocalCopy when both ↵Michael Kuperstein2015-01-193-17/+54
| | | | | | | | | | source and dest are local This fixes PR21792. Differential Revision: http://reviews.llvm.org/D6823 llvm-svn: 226433
* [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2Hal Finkel2015-01-191-2/+2
| | | | | | | | | | | Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call instructions in order to represent the dependency on the TOC base pointer value. Restricting this to ELF V2, however, does not seem to make sense: calls under ELF V1 have the same dependence, and indirect calls have an r2 dependence just as direct ones. Make sure the dependence is noted for all calls under both ELF V1 and ELF V2. llvm-svn: 226432
* R600: Remove redundant testMatt Arsenault2015-01-183-15/+2
| | | | | | This is already covered in ftrunc.ll llvm-svn: 226412
* [X86][SSE] Added scalar min/max folding tests. NFC.Simon Pilgrim2015-01-181-4/+40
| | | | llvm-svn: 226406
* [X86][SSE] Added float extract and xmm extract/insert stack folding tests. NFC.Simon Pilgrim2015-01-181-3/+26
| | | | llvm-svn: 226405
* [X86][SSE] Added scalar conversion stack folding tests. NFC.Simon Pilgrim2015-01-181-12/+156
| | | | llvm-svn: 226404
* AVX1 stack folding tests. NFC.Simon Pilgrim2015-01-181-31/+1246
| | | | | | | | Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added. I'll then move on to SSE tests and then the integer instructions. llvm-svn: 226400
* [PowerPC] Initial PPC64 calling-convention changes for fastccHal Finkel2015-01-182-0/+596
| | | | | | | | | | | | | | | | | The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is designed to work with both prototyped and non-prototyped/varargs functions. As a result, GPRs and stack space are allocated for every argument, even those that are passed in floating-point or vector registers. GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that do not have their address taken) to use the 'fast' calling convention. When functions are using the 'fast' calling convention, don't allocate GPRs for arguments passed in other types of registers, and don't allocate stack space for arguments passed in registers. Other changes for the fast calling convention may be added in the future. llvm-svn: 226399
* [PowerPC] Don't list R11 as a patchpoint scratch registerHal Finkel2015-01-172-3/+3
| | | | | | | | | | R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is reserved for use as an "environment pointer" for compilation models that require such a thing. We don't, we also don't need a second scratch register, and because we support only "local" patchpoint call targets, we might as well let R11 be used for anyregcc patchpoints. llvm-svn: 226369
* Improve DAG combine pass on certain IR vector patternsMehdi Amini2015-01-171-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector produced suboptimal code in AVX2 mode with certain IR combinations. In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32 (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical, but then mysteriously generated rather bad code; the movq/movhpd combination didn't match. The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs would get promoted to 4f32 by the type legalizer, eventually resulting in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing these were both half the output size, concatted them and then produced a shuffle. However, the resulting concat + shuffle was more complex than it should be; in the case where the upper half of the output is undef, we probably want to generate shuffle + concat instead. This enhancement causes the vector_shuffle combine step to recognize this suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR in case the same suboptimal pattern occurs for other reasons. This results in the optimizer correctly producing the optimal movq + movhpd sequence for all three variations on this IR, even with AVX2. I've included a test case. Radar link: rdar://problem/19287012 Fix for PR 21943. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 226360
* R600: Clean up floor testsMatt Arsenault2015-01-164-151/+146
| | | | | | | | These were using different naming schemes, not using multiple check prefixes and not using -LABEL. llvm-svn: 226333
* [Hexagon] Converting halfword to doubleword multiply intrinsics.Colin LeMahieu2015-01-161-0/+270
| | | | llvm-svn: 226326
* [Hexagon] Converting accumulating halfword multiply intrinsics to patterns.Colin LeMahieu2015-01-161-0/+773
| | | | llvm-svn: 226324
* [Hexagon] Beginning converting intrinsics to patterns instead of duplicated ↵Colin LeMahieu2015-01-161-0/+445
| | | | | | definitions. Converting halfword multiply intrinsics. llvm-svn: 226318
* [AVX512] Add intrinsics for masked aligned FP loads and storesAdam Nemet2015-01-161-1/+67
| | | | | | | | | | Similar to the unaligned cases. Test was generated with update_llc_test_checks.py. Part of <rdar://problem/17688758> llvm-svn: 226296
* [AVX512] Remove trailing whitespaces in this testAdam Nemet2015-01-161-12/+12
| | | | llvm-svn: 226295
* [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.Andrea Di Biagio2015-01-161-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch disables target specific combine on X86ISD::INSERTPS dag nodes if optlevel is CodeGenOpt::None. The backend currently implements a target specific combine rule that converts a vector load used by an INSERTPS dag node into a scalar load plus a scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of two instructions (i.e. a vector load plus INSERTPSrr). However, the existing target combine rule on INSERTPS nodes only works under the assumption that ISel will always be able to match an INSERTPSrm. This is not true in general at -O0, since the backend only allows folding a load into the memory operand of an instruction if the optimization level is not CodeGenOpt::None. In the example below: // __m128 test(__m128 a, __m128 *b) { __m128 c = _mm_insert_ps(a, *b, 1 << 6); return c; } // Before this patch, at -O0, the backend would have canonicalized the load to 'b' into a scalar load plus scalar_to_vector. Later on, ISel would have selected an INSERTPSrr leaving the insertps mask in an inconsistent state: movss 4(%rdi), %xmm1 insertps $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3]. With this patch, the backend avoids folding the vector load into the operand of the INSERTPS. The new codegen at -O0 is: movaps (%rdi), %xmm1 insertps $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3]. llvm-svn: 226277
* [X86] Refactored stack memory folding tests to explicitly force register ↵Simon Pilgrim2015-01-161-62/+60
| | | | | | | | | | | | | | spilling The current 'big vectors' stack folded reload testing pattern is very bulky and makes it difficult to test all instructions as big vectors will tend to use only the ymm instruction implementations. This patch changes the tests to use a nop call that lists explicit xmm registers as sideeffects, with this we can force a partial register spill of the relevant registers and then check that the reload is correctly folded. The asm generated only adds the forced spill, a nop instruction and a couple of extra labels (a fraction of the current approach). More exhaustive tests will follow shortly, I've added some extra tests (the xmm versions of some of the existing folding tests) as a starting point. Differential Revision: http://reviews.llvm.org/D6932 llvm-svn: 226264
* Revert r226242 - Revert Revert Don't create new comdats in CodeGenTimur Iskhodzhanov2015-01-167-2/+46
| | | | | | This breaks AddressSanitizer (ninja check-asan) on Windows llvm-svn: 226251
* [PowerPC] Adjust PatchPoints for ppc64leHal Finkel2015-01-161-15/+19
| | | | | | | | | | Bill Schmidt pointed out that some adjustments would be needed to properly support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available as a scratch register, so we need to use R12. R12 is also available under ELF V1, so to maintain consistency, I flipped the order to make R12 the first scratch register in the array under both ABIs. llvm-svn: 226247
* Revert "Revert Don't create new comdats in CodeGen"Rafael Espindola2015-01-167-46/+2
| | | | | | | | | | | | | | | | | | This reverts commit r226173, adding r226038 back. No change in this commit, but clang was changed to also produce trivial comdats for costructors, destructors and vtables when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226242
* R600/SI: Add patterns for v_cvt_{flr|rpi}_i32_f32Matt Arsenault2015-01-152-0/+155
| | | | llvm-svn: 226230
* R600/SI: Fix trailing comma with modifiersMatt Arsenault2015-01-151-1/+14
| | | | | | | Instructions with 1 operand can still use source modifiers, so make sure we don't print an extra comma afterwards. llvm-svn: 226226
* [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect callsHal Finkel2015-01-152-1/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the POWER7, A2 and earlier cores) are really pointers to a function descriptor, a structure with three pointers: the actual pointer to the code to which to jump, the pointer to the TOC needed by the callee, and an environment pointer. We used to chain these loads, and make them opaque to the rest of the optimizer, so that they'd always occur directly before the call. This is not necessary, and in fact, highly suboptimal on embedded cores. Once the function pointer is known, the loads can be performed ahead of time; in fact, they can be hoisted out of loops. Now these function descriptors are almost always generated by the linker, and thus the contents of the descriptors are invariant. As a result, by default, we'll mark the associated loads as invariant (allowing them to be hoisted out of loops). I've added a target feature to turn this off, however, just in case someone needs that option (constructing an on-stack descriptor, casting it to a function pointer, and then calling it cannot be well-defined C/C++ code, but I can imagine some JIT-compilation system doing so). Consider this simple test: $ cat call.c typedef void (*fp)(); void bar(fp x) { for (int i = 0; i < 1600000000; ++i) x(); } $ cat main.c typedef void (*fp)(); void bar(fp x); void foo() {} int main() { bar(foo); } On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads as invariant brings the execution time down to ~8 seconds from ~32 seconds with the loads in the loop. The difference on the POWER7 is smaller. Compiling with: gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2] clang -O3 -mcpu=native call.c main.c : ~5.3 seconds clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds (looks like we'd benefit from additional loop unrolling here, as a first guess, because this is faster with the extra loads) The -mno-invariant-function-descriptors will be added to Clang shortly. llvm-svn: 226207
* [Hexagon] Updating indexed load-extend patterns and changing test to new ↵Colin LeMahieu2015-01-151-20/+20
| | | | | | expected output. llvm-svn: 226206
* Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to ↵Hal Finkel2015-01-152-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reserved registers"" Reapply r226071 with fixes. Two fixes: 1. We need to manually remove the old and create the new 'deaf defs' associated with physical register definitions when we move the definition of the physical register from the copy point to the point of the original vreg def. This problem was picked up by the machinstr verifier, and could trigger a verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've turned on the verifier in the tests. 2. When moving the def point of the phys reg up, we need to make sure that it is neither defined nor read in between the two instructions. We don't, however, extend the live ranges of phys reg defs to cover uses, so just checking for live-range overlap between the pair interval and the phys reg aliases won't pick up reads. As a result, we manually iterate over the range and check for reads. A test soon to be committed to the PowerPC backend will test this change. Original commit message: [RegisterCoalescer] Remove copies to reserved registers This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying *from* a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226200
* R600/SI: Improve fpext / fptrunc test coverageMatt Arsenault2015-01-152-8/+78
| | | | llvm-svn: 226197
* R600/SI: Use 64-bit encoding by default for opcodes that are VOP3-only on VIMarek Olsak2015-01-152-36/+33
| | | | llvm-svn: 226190
* statepoint tests: use statepoint-example gcRamkumar Ramachandra2015-01-153-28/+28
| | | | | | | Mechanical conversion of statepoint tests to use the example-statepoint gc. llvm-svn: 226183
* [Hexagon] Deleting old float comparison instruction and updating references ↵Colin LeMahieu2015-01-151-1/+1
| | | | | | to new ones. llvm-svn: 226179
* [Hexagon] Replacing old fadd/fsub instructions and updating references.Colin LeMahieu2015-01-152-2/+2
| | | | llvm-svn: 226176
* Revert Don't create new comdats in CodeGenTimur Iskhodzhanov2015-01-157-2/+46
| | | | | | It breaks AddressSanitizer on Windows. llvm-svn: 226173
* [mips] Fix a typo in the compare patterns for MIPS32r6/MIPS64r6.Daniel Sanders2015-01-151-0/+90
| | | | | | | | | | | | | | Summary: The patterns intended for the SETLE node were actually matching the SETLT node. Reviewers: atanasyan, sstankovic, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6997 llvm-svn: 226171
* Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"Hal Finkel2015-01-152-24/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reverting this while I investigate some bad behavior this is causing. As a possibly-related issue, adding -verify-machineinstrs to one of the test cases now fails because of this change: llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs *** Bad machine code: No instruction at def index *** - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS Valno #3 is defined at 624r *** Bad machine code: Live segment doesn't end at a valid instruction *** - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS [624r,624d:3) LLVM ERROR: Found 2 machine code errors. where 624r corresponds exactly to the interval combining change: 624B %RSP<def> = COPY %vreg16; GR64:%vreg16 Considering merging %vreg16 with %RSP RHS = %vreg16 [608r,624r:0) 0@608r updated: 608B %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1] Success: %vreg16 -> %RSP Result = %RSP llvm-svn: 226086
* [RegisterCoalescer] Remove copies to reserved registersHal Finkel2015-01-152-2/+24
| | | | | | | | | | | | | | | | | | | | | | This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying *from* a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226071
* getMangledTypeStr: clarify how it mangles types, and add testsPhilip Reames2015-01-141-0/+47
| | | | | | | | | "Write a set of tests that show how name mangling is done for overloaded intrinsics." These happen to use gc.relocates to exercise the codepath in question, but is not a GC specific test. Patch by: artagnon@gmail.com Differential Revision: http://reviews.llvm.org/D6915 llvm-svn: 226056
* IR: Move MDLocation into placeDuncan P. N. Exon Smith2015-01-1443-290/+290
| | | | | | | | | | | | | | | | | | | | This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048
* Don't create new comdats in CodeGen.Rafael Espindola2015-01-147-46/+2
| | | | | | | | | This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226038
* [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.Chandler Carruth2015-01-141-0/+112
| | | | | | | | | | | | | | | | | | | | | | | | Some benchmarks have shown that this could lead to a potential performance benefit, and so adding some flags to try to help measure the difference. A possible explanation. In diamond-shaped CFGs (A followed by either B or C both followed by D), putting B and C both in between A and D leads to the code being less dense than it could be. Always either B or C have to be skipped increasing the chance of cache misses etc. Moving either B or C to after D might be beneficial on average. In the long run, but we should probably do a better job of analyzing the basic block and branch probabilities to move the correct one of B or C to after D. But even if we don't use this in the long run, it is a good baseline for benchmarking. Original patch authored by Daniel Jasper with test tweaks and a second flag added by me. Differential Revision: http://reviews.llvm.org/D6969 llvm-svn: 226034
* [PPC64] Add support for the ICBT instruction on POWER8.Bill Schmidt2015-01-142-0/+35
| | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Support for the ICBT instruction is currently present, but limited to embedded processors. This change adds a new FeatureICBT that can be used to identify whether the ICBT instruction is available on a specific processor. Two new tests are added: * Positive test to ensure the icbt instruction is present when using -mcpu=pwr8 * Negative test to ensure the icbt instruction is not generated when using -mcpu=pwr7 Both test cases use the Prefetch opcode in LLVM. They are based on the ppc64-prefetch.ll test case. llvm-svn: 226033
* Check that the TLI callback enableAggressiveFMAFusion has the desired effect ↵Olivier Sallenave2015-01-142-0/+50
| | | | | | on FMA folding. llvm-svn: 225987
* [mips] Refine octeon instructions seq/seqi/sne/sneiKai Nacke2015-01-141-0/+66
| | | | | | | | | | | | | | This commit refines the pattern for the octeon seq/seqi/sne/snei instructions. The target register is set to 0 or 1 according to the result of the comparison. In C, this is something like rd = (unsigned long)(rs == rt) This commit adds a zext to bring the result to i64. With this change the instruction is selected for this type of code. (gcc produces the same code for the above C code.) llvm-svn: 225968
* Use the integrated assembler by default on SPARC.Brad Smith2015-01-143-3/+3
| | | | llvm-svn: 225957
* Revert "Insert random noops to increase security against ROP attacks (llvm)"JF Bastien2015-01-144-173/+0
| | | | | | | This reverts commit: http://reviews.llvm.org/D3392 llvm-svn: 225948
* Disable a couple of tests, CodeGen/X86/noop-insert.ll and ↵NAKAMURA Takumi2015-01-142-0/+4
| | | | | | CodeGen/X86/noop-insert-percentage.ll, in r225908, to unbreak tests. llvm-svn: 225940
* ARM: add test for crc32 instructions in CodeGen.Tim Northover2015-01-141-0/+58
| | | | | | | Somehow we seem to have ended up without any actual tests of the CodeGen side. Easy enough to fix. llvm-svn: 225930
* [PowerPC] Fix the noop-insert testHal Finkel2015-01-141-3/+3
| | | | | | | | | | The form of nops used is CPU-specific (some CPUs, such as the POWER7, have special group-terminating nops). We probably want a different callback for this kind of nop insertion (something more like MCAsmBackend::writeNopData), or for PPC to use a different mechanism for scheduling nops, but this will stop the test from failing for now. llvm-svn: 225928
* R600/SI: Remove some redudant load testcases.Matt Arsenault2015-01-144-76/+9
| | | | | | | This reduces coverage for Evergreen, since the more complete tests have those run lines disabled. llvm-svn: 225927
* R600/SI: Fix bad code with unaligned byte vector loadsMatt Arsenault2015-01-141-17/+7
| | | | | | | | | Don't do the v4i8 -> v4f32 combine if the load will need to be expanded due to alignment. This stops adding instructions to repack into a single register that the v_cvt_ubyteN_f32 instructions read. llvm-svn: 225926
OpenPOWER on IntegriCloud