summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [CodeGen] Add hook/combine to form vector extloads, enabled on X86.Ahmed Bougacha2015-02-053-12/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 llvm-svn: 228325
* X86 ABI fix for return values > 24 bytes.Andrew Trick2015-02-051-8/+9
| | | | | | | | | | | The return value's address must be returned in %rax. i.e. the callee needs to copy the sret argument (%rdi) into the return value (%rax). This probably won't manifest as a bug when the caller is LLVM-compiled code. But it is an ABI guarantee and tools expect it. llvm-svn: 228321
* [Hexagon] Renaming A2_addi and formatting.Colin LeMahieu2015-02-057-37/+34
| | | | llvm-svn: 228318
* move fold comments to the corresponding fold; NFCSanjay Patel2015-02-051-3/+9
| | | | llvm-svn: 228317
* [Hexagon] Since decoding conflicts have been resolved, isCodeGenOnly = 0 by ↵Colin LeMahieu2015-02-056-532/+207
| | | | | | default and remove explicitly setting it. llvm-svn: 228316
* LowerSwitch: Use ConstantInt for CaseRange::{Low,High}Hans Wennborg2015-02-051-20/+20
| | | | | | | Case values are always ConstantInt. This allows us to remove a bunch of casts. NFC. llvm-svn: 228312
* LowerSwitch: remove default args from CaseRange ctor; NFCHans Wennborg2015-02-051-3/+2
| | | | llvm-svn: 228311
* R600/SI: Fix bug in TTI loop unrolling preferencesTom Stellard2015-02-051-1/+1
| | | | | | | | | | | | | We should be setting UnrollingPreferences::MaxCount to MAX_UINT instead of UnrollingPreferences::Count. Count is a 'forced unrolling factor', while MaxCount sets an upper limit to the unrolling factor. Setting Count to MAX_UINT was causing the loop in the testcase to be unrolled 15 times, when it only had a maximum of 4 iterations. llvm-svn: 228303
* R600/SI: Fix bug from insertion of llvm.SI.end.cf into loop headersTom Stellard2015-02-051-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The llvm.SI.end.cf intrinsic is used to mark the end of if-then blocks, if-then-else blocks, and loops. It is responsible for updating the exec mask to re-enable threads that had been masked during the preceding control flow block. For example: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf The bug fixed by this patch was one where the llvm.SI.end.cf intrinsic was being inserted into the header of loops. This would happen when an if block terminated in a loop header and we would end up with code like this: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() LOOP: ; Start of loop header s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf <-BUG: The exec mask has the same value at the beginning of each loop iteration. do_stuff(); s_cbranch_execnz LOOP The fix is to create a new basic block before the loop and insert the llvm.SI.end.cf there. This way the exec mask is restored before the start of the loop instead of at the beginning of each iteration. llvm-svn: 228302
* [PowerPC] Implement the vclz instructions for PWR8Bill Schmidt2015-02-052-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector count leading zeros instruction for byte, halfword, word, and doubleword sizes. This is a fairly straightforward addition after the changes made for vpopcnt: 1. Add the correct definitions for the various instructions in PPCInstrAltivec.td 2. Make the CTLZ operation legal on vector types when using P8Altivec in PPCISelLowering.cpp Test Plan Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the instructions are being generated when the CTLZ operation is used in LLVM. Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively. llvm-svn: 228301
* Add a FIXME.Rafael Espindola2015-02-051-0/+3
| | | | | | Thanks to Eric for the suggestion. llvm-svn: 228300
* Removing an unused variable warning I accidentally introduced with my last ↵Aaron Ballman2015-02-051-1/+1
| | | | | | warning fix; NFC. llvm-svn: 228295
* Silencing an MSVC warning about a switch statement with no cases; NFC.Aaron Ballman2015-02-051-8/+5
| | | | llvm-svn: 228294
* [X86][MMX] Handle i32->mmx conversion using movdBruno Cardoso Lopes2015-02-054-0/+38
| | | | | | | | | Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293
* [X86][MMX] Move MMX DAG node to proper fileBruno Cardoso Lopes2015-02-052-3/+8
| | | | llvm-svn: 228291
* Teach isDereferenceablePointer() to look through bitcast constant expressions.Michael Kuperstein2015-02-051-1/+1
| | | | | | | | This fixes a LICM regression due to the new load+store pair canonicalization. Differential Revision: http://reviews.llvm.org/D7411 llvm-svn: 228284
* [X86] Add xrstors/xsavec/xsaves/clflushopt/clwb/pcommit instructionsCraig Topper2015-02-053-4/+27
| | | | llvm-svn: 228283
* [X86] Remove two feature flags that covered sets of instructions that have ↵Craig Topper2015-02-056-21/+4
| | | | | | no patterns or intrinsics. Since we don't check feature flags in the assembler parser for any instruction sets, these flags don't provide any value. This frees up 2 of the fully utilized feature flags. llvm-svn: 228282
* R600/SI: Fix i64 truncate to i1Matt Arsenault2015-02-051-0/+6
| | | | llvm-svn: 228273
* Disable enumeral mismatch warning when compiling llvm with gcc.Larisse Voufo2015-02-051-2/+3
| | | | | | | | | | | Tested with gcc 4.9.2. Compiling with -Werror was producing: .../llvm/lib/Target/X86/X86ISelLowering.cpp: In function 'llvm::SDValue lowerVectorShuffleAsBitMask(llvm::SDLoc, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, llvm::SelectionDAG&)': .../llvm/lib/Target/X86/X86ISelLowering.cpp:7771:40: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] V = DAG.getNode(VT.isFloatingPoint() ? X86ISD::FAND : ISD::AND, DL, VT, V, ^ llvm-svn: 228271
* Implement new heuristic for complete loop unrolling.Michael Zolotukhin2015-02-051-2/+332
| | | | | | | | | | | | | | | | | | | | | | | | | Complete loop unrolling can make some loads constant, thus enabling a lot of other optimizations. To catch such cases, we look for loads that might become constants and estimate number of instructions that would be simplified or become dead after substitution. Example: Suppose we have: int a[] = {0, 1, 0}; v = 0; for (i = 0; i < 3; i ++) v += b[i]*a[i]; If we completely unroll the loop, we would get: v = b[0]*a[0] + b[1]*a[1] + b[2]*a[2] Which then will be simplified to: v = b[0]* 0 + b[1]* 1 + b[2]* 0 And finally: v = b[1] llvm-svn: 228265
* Value soft float calls as more expensive in the inliner.Cameron Esfahani2015-02-055-1/+46
| | | | | | | | | | | | | | Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point. Reviewers: chandlerc, echristo Reviewed By: echristo Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6936 llvm-svn: 228263
* Try to fix the build in MCValue.cppReid Kleckner2015-02-051-1/+1
| | | | llvm-svn: 228256
* Fixup.Sean Silva2015-02-051-2/+2
| | | | | | Didn't see these calls in my release build locally when testing. llvm-svn: 228254
* [MC] Remove various unused MCAsmInfo parameters.Sean Silva2015-02-054-15/+10
| | | | llvm-svn: 228244
* IR: Rename 'operator ==()' to 'isKeyOf()', NFCDuncan P. N. Exon Smith2015-02-051-4/+4
| | | | | | `isKeyOf()` is a clearer name than overloading `operator==()`. llvm-svn: 228242
* [Hexagon] Deleting unused instructions and adding isCodeGenOnly to some defs.Colin LeMahieu2015-02-053-34/+8
| | | | llvm-svn: 228238
* [Hexagon] Updating load extend to i64 patterns.Colin LeMahieu2015-02-041-85/+30
| | | | llvm-svn: 228237
* [fuzzer] add flag prefer_small_during_initial_shuffle, be a bit more verboseKostya Serebryany2015-02-044-6/+32
| | | | llvm-svn: 228235
* [Hexagon] Cleaning up i1 load and extension patterns.Colin LeMahieu2015-02-041-24/+11
| | | | llvm-svn: 228232
* [Hexagon] Simplifying more load and store patterns and using new addressing ↵Colin LeMahieu2015-02-041-72/+41
| | | | | | patterns. llvm-svn: 228231
* R600/SI: Enable subreg liveness by defaultTom Stellard2015-02-041-0/+4
| | | | llvm-svn: 228228
* [Hexagon] Simplifying some load and store patterns.Colin LeMahieu2015-02-041-68/+35
| | | | llvm-svn: 228227
* AsmParser: Split out LineField, NFCDuncan P. N. Exon Smith2015-02-041-2/+17
| | | | | | | Split out `LineField`, which restricts the legal line numbers. This will make it easier to be consistent between different node parsers. llvm-svn: 228226
* [Hexagon] Converting absolute-address load patterns to use AddrGP.Colin LeMahieu2015-02-041-48/+13
| | | | llvm-svn: 228225
* [Hexagon] Converting atomic store/load to use AddrGP addressing.Colin LeMahieu2015-02-041-33/+10
| | | | llvm-svn: 228223
* [Hexagon] Simplifying some store patterns. Adding AddrGP addressing forms.Colin LeMahieu2015-02-043-24/+16
| | | | llvm-svn: 228220
* [fuzzer] add -runs=N to limit the number of runs per session. Also, make ↵Kostya Serebryany2015-02-044-9/+18
| | | | | | sure we do some mutations w/o cross over. llvm-svn: 228214
* Fix GCC error caused by r228211Duncan P. N. Exon Smith2015-02-041-0/+4
| | | | llvm-svn: 228213
* IR: Reduce boilerplate in DenseMapInfo overrides, NFCDuncan P. N. Exon Smith2015-02-041-91/+63
| | | | | | | Minimize the boilerplate required for the `MDNode` subclass `DenseMapInfo<>` overrides in `LLVMContextImpl`. llvm-svn: 228212
* AsmParser: Move MDField details to source file, NFCDuncan P. N. Exon Smith2015-02-042-38/+44
| | | | | | | | Move all the types of `MDField` to an anonymous namespace in the source file. This also eliminates the duplication of `ParseMDField()` declarations in the header for each new field type. llvm-svn: 228211
* AsmParser: Simplify assertion, NFCDuncan P. N. Exon Smith2015-02-041-1/+1
| | | | llvm-svn: 228209
* AsmParser: Remove dead code, NFCDuncan P. N. Exon Smith2015-02-041-4/+0
| | | | | | This condition is checked in the generic `ParseMDField()`. llvm-svn: 228208
* AsmParser: Simplify MDUnsignedFieldDuncan P. N. Exon Smith2015-02-042-17/+14
| | | | | | We only need `uint64_t` for storage. llvm-svn: 228205
* IR: Initialize MDNode abbreviations en masse, NFCDuncan P. N. Exon Smith2015-02-041-3/+4
| | | | llvm-svn: 228203
* IR: Define MDNode uniquing sets automatically, NFCDuncan P. N. Exon Smith2015-02-041-3/+2
| | | | llvm-svn: 228200
* Don' try to make sections in comdats SHF_MERGE.Rafael Espindola2015-02-041-4/+4
| | | | | | | | | | | | Parts of llvm were not expecting it and we wouldn't print the entity size of the section. Given what comdats are used for, having SHF_MERGE sections would be just a small improvement, so just disable it for now. Fixes pr22463. llvm-svn: 228196
* R600/SI: Expand misaligned 16-bit memory accessesTom Stellard2015-02-041-0/+5
| | | | llvm-svn: 228190
* R600/SI: Make more store operations legalTom Stellard2015-02-042-12/+0
| | | | | | | | | | | v2i32, i32, trunc i32 to i16, and truc i32 to i8 stores are legal for all address spaces. We had marked them as custom in order to lower them for the private address space, but this is no longer necessary. This enables lowering of misaligned stores of these types in the DAGLegalizer. llvm-svn: 228189
* R600: Don't promote i64 stores to v2i32 during DAG legalizationTom Stellard2015-02-042-3/+25
| | | | | | | We take care of this during instruction selection now. This fixes a potential infinite loop when lowering misaligned stores. llvm-svn: 228188
OpenPOWER on IntegriCloud