summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-286-270/+304
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* Silence unused variable errors for no-asserts buildsBill Schmidt2015-04-271-0/+4
| | | | llvm-svn: 235913
* [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computationsBill Schmidt2015-04-276-0/+824
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new SSA MI pass that runs on little-endian PPC64 code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors without alignment constraints are accomplished for little-endian using lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional xxswapd instructions hurts performance in comparison with big-endian code, but they are necessary in the general case to support correct semantics. However, the general case does not apply to most vector code. Many vector instructions are lane-insensitive; they do not "care" which lanes the parallel computations are performed within, provided that the resulting data is stored into the correct locations. Thus this pass looks for computations that perform only lane-insensitive operations, and remove the unnecessary swaps from loads and stores in such computations. Future improvements will allow computations using certain lane-sensitive operations to also be optimized in this manner, by modifying the lane-sensitive operations to account for the permuted order of the lanes. However, this patch only adds the infrastructure to permit this; no lane-sensitive operations are optimized at this time. This code is heavily exercised by the various vectorizing applications in the projects/test-suite tree. For the time being, I have only added one simple test case to demonstrate what the pass is doing. Although it is quite simple, it provides coverage for much of the code, including the special case handling of copies and subreg-to-reg operations feeding the swaps. I plan to add additional tests in the future as I fill in more of the "special handling" code. Two existing tests were affected, because they expected the swaps to be present, but they are now removed. llvm-svn: 235910
* [AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.Lang Hames2015-04-241-150/+150
| | | | | | | AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a reference for this is crufty. llvm-svn: 235752
* [PowerPC] Support register name prefixes for vector registersHal Finkel2015-04-231-0/+8
| | | | | | | Match binutils by supporting the optional register name prefix for new vector registers ("vs" for VSX registers and "q" for QPX registers). llvm-svn: 235665
* [PowerPC] Use sync inst alias when printingHal Finkel2015-04-231-1/+1
| | | | | | | So long as the choice between printing msync and sync is not ambiguous, we can print 'sync 0' and just 'sync'. llvm-svn: 235663
* [PowerPC] Add asm/disasm support for dcbt with hintHal Finkel2015-04-234-8/+123
| | | | | | | | | | | | | | | | | | Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint field specified (non-zero). Unforunately, the syntax for this instruction is special in that it differs for server vs. embedded cores: dcbt ra, rb, th [server] dcbt th, ra, rb [embedded] where th can be omitted when it is 0. dcbtst is the same. Thus we need to play games in the parser and the printer to flip the operands around on the embedded cores. We'll use the server syntax as the default (binutils currently uses the embedded form by default, but IBM is changing that). We also stop marking dcbtst as having unmodeled side effects (this is not necessary, it is just a hint like dcbt -- noticed by inspection, so no separate test case). llvm-svn: 235657
* [PowerPC] Enable printing instructions using aliasesHal Finkel2015-04-232-2/+8
| | | | | | | | | | | TableGen had been nicely generating code to print a number of instructions using shorter aliases (and PowerPC has plenty of short mnemonics), but we were not calling it. For some of the aliases we support in the parser, TableGen can't infer the "inverse" alias relationship, so there is still more to do. Thus, after some hours of updating test cases... llvm-svn: 235616
* Re-commit r235560: Switch lowering: extract jump tables and bit tests before ↵Hans Wennborg2015-04-231-5/+6
| | | | | | | | | | | building binary tree (PR22262) Third time's the charm. The previous commit was reverted as a reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--' on an iterator at the beginning of a vector, causing asserts when using debugging iterators. This commit fixes that. llvm-svn: 235608
* Revert r235560; this commit was causing several failed assertions in Debug ↵Aaron Ballman2015-04-231-6/+5
| | | | | | builds using MSVC's STL. The iterator is being used outside of its valid range. llvm-svn: 235597
* Switch lowering: extract jump tables and bit tests before building binary ↵Hans Wennborg2015-04-221-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tree (PR22262) This is a re-commit of r235101, which also fixes the problems with the previous patch: - Switches with only a default case and non-fallthrough were handled incorrectly - The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here. > This is a major rewrite of the SelectionDAG switch lowering. The previous code > would lower switches as a binary tre, discovering clusters of cases > suitable for lowering by jump tables or bit tests as it went along. To increase > the likelihood of finding jump tables, the binary tree pivot was selected to > maximize case density on both sides of the pivot. > > By not selecting the pivot in the middle, the binary trees would not always > be balanced, leading to performance problems in the generated code. > > This patch rewrites the lowering to search for clusters of cases > suitable for jump tables or bit tests first, and then builds the binary > tree around those clusters. This way, the binary tree will always be balanced. > > This has the added benefit of decoupling the different aspects of the lowering: > tree building and jump table or bit tests finding are now easier to tweak > separately. > > For example, this will enable us to balance the tree based on profile info > in the future. > > The algorithm for finding jump tables is quadratic, whereas the previous algorithm > was O(n log n) for common cases, and quadratic only in the worst-case. This > doesn't seem to be major problem in practice, e.g. compiling a file consisting > of a 10k-case switch was only 30% slower, and such large switches should be rare > in practice. Compiling e.g. gcc.c showed no compile-time difference. If this > does turn out to be a problem, we could limit the search space of the algorithm. > > This commit also disables all optimizations during switch lowering in -O0. > > Differential Revision: http://reviews.llvm.org/D8649 llvm-svn: 235560
* [PowerPC] Flow oversized lines for r235309Bill Schmidt2015-04-201-11/+25
| | | | llvm-svn: 235310
* [PowerPC] Add future work for vector insert/extract to README_ALTIVEC.txtBill Schmidt2015-04-201-0/+14
| | | | llvm-svn: 235309
* [mc] Clean up emission of byte sequencesBenjamin Kramer2015-04-171-6/+1
| | | | | | No functional change intended. llvm-svn: 235178
* Use raw_pwrite_stream in the object writer/streamer.Rafael Espindola2015-04-144-11/+10
| | | | | | The ELF object writer will take advantage of that in the next commit. llvm-svn: 234950
* Correct 'teh' and other typos / repeated words.Ed Maste2015-04-141-1/+1
| | | | | | | | Patch by Eitan Adler. Differential Revision: http://reviews.llvm.org/D8514 llvm-svn: 234939
* Allow memory intrinsics to be tail callsKrzysztof Parzyszek2015-04-131-2/+2
| | | | llvm-svn: 234764
* [PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrepHal Finkel2015-04-122-14/+6
| | | | | | | | When I fixed these a couple of days ago to iterate over all loops, not just depth == 1 loops, I inadvertently made it such that we'd only look at the first top-level loop. Make sure that we really look at all of them. llvm-svn: 234705
* [PowerPC] Disable part-word atomics on the P7Hal Finkel2015-04-111-2/+2
| | | | | | | As it turns out, even though these are part of ISA 2.06, the P7 does not support them (or, at least, not any P7s we're tested so far). llvm-svn: 234686
* Add direct moves to/from VSR and exploit them for FP/INT conversionsNemanja Ivanovic2015-04-118-1/+134
| | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D8928 It adds direct move instructions to/from VSX registers to GPR's. These are exploited for FP <-> INT conversions. llvm-svn: 234682
* Use 'override/final' instead of 'virtual' for overridden methodsAlexander Kornienko2015-04-113-3/+3
| | | | | | | | | | | | | | The patch is generated using clang-tidy misc-use-override check. This command was used: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \ -checks='-*,misc-use-override' -header-filter='llvm|clang' \ -j=32 -fix -format http://reviews.llvm.org/D8925 llvm-svn: 234679
* [PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loopsHal Finkel2015-04-111-10/+27
| | | | | | | | | This pass had the same problem as the data-prefetching pass: it was only checking for depth == 1 loops in practice. Fix that, add some debugging statements, and make sure that, when we grab an AddRec, it is for the loop we expect. llvm-svn: 234670
* [PowerPC] Prefetching should also consider depth > 1 loopsHal Finkel2015-04-101-2/+5
| | | | | | | Iterating over loops from the LoopInfo instance only provides top-level loops. We need to search the whole tree of loops to find the inner ones. llvm-svn: 234603
* [PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern coresHal Finkel2015-04-101-0/+1
| | | | | | | | | | When we have an instruction for this (and, thus, don't generate a runtime call), we need to custom type legalize this (in a trivial way, just as we do for fp_to_sint). Fixes PR23173. llvm-svn: 234561
* Add LLVM support for remaining integer divide and permute instructions from ↵Nemanja Ivanovic2015-04-096-52/+133
| | | | | | | | | | | ISA 2.06 This is the patch corresponding to review: http://reviews.llvm.org/D8406 It adds some missing instructions from ISA 2.06 to the PPC back end. llvm-svn: 234546
* clang-format bits of code to make a followup patch easy to read.Rafael Espindola2015-04-091-2/+1
| | | | llvm-svn: 234519
* Don't repeat name in comment. NFC.Rafael Espindola2015-04-091-6/+6
| | | | llvm-svn: 234506
* Refactor a lot of duplicated code for stub output.Rafael Espindola2015-04-071-19/+0
| | | | | | | This also moves it earlier so that it they are produced before we print an end symbol for the data section. llvm-svn: 234315
* [PowerPC] Enable splat generation for BUILD_VECTOR with little endianBill Schmidt2015-04-032-37/+2
| | | | | | | | | | | | | | | | | | | When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR nodes for little endian because wrong results were produced. I've subsequently investigated and found this is due to a call to BuildVectorSDNode::isConstantSplat that was always specifying big-endian. With this changed to correctly identify the target endianness, the optimizations work as expected. I found another case of a call to the same method with big-endian hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was an orphaned method with no callers, so I've just removed it. The existing test/CodeGen/PowerPC/vec_constants.ll checks these optimizations, so for testing I've just added a variant for little endian. llvm-svn: 234011
* [PowerPC] FastISel can't handle i1 return values when using CR bitsHal Finkel2015-04-011-0/+3
| | | | | | | | | | | | | | Under normal circumstances, use of CR bits is disabled when running at -O0, but it is enabled by default otherwise, and if you have optnone functions, they'll still generally be generated with crbits turned on (because nothing else turns them off). FastISel can't handle most things dealing with i1 values when using CR bits, and checks for that, but was not checking the return type on functions; we can't fast-isel function calls with i1 return values either when using CR bits for boolean values. Fixes PR22664. llvm-svn: 233775
* [PowerPC] Don't use a vector preferred memory type at -O0Hal Finkel2015-03-311-15/+17
| | | | | | | | | | | | Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic is a memset/memcpy/etc. we might normally use vector types. At -O0, this is probably not a good idea (because, if there is a bug in the lowering code, there would be no good way to turn it off). At -O0, only use scalar preferred types. Related to PR22754. llvm-svn: 233755
* [PowerPC] Remove TargetMachine CPU auto-detectionUlrich Weigand2015-03-311-6/+0
| | | | | | As was done for X86 in r206094. llvm-svn: 233684
* Replace the MCSubtargetInfo parameter with a Triple when creatingEric Christopher2015-03-311-5/+4
| | | | | | | an MCInstPrinter. Update all callers and use where we wanted a Triple previously. llvm-svn: 233648
* Remove unused Target argument from MCInstPrinter ctor functions.Eric Christopher2015-03-301-2/+1
| | | | llvm-svn: 233607
* [PowerPC] Add asm parser support for bitmask forms of rotate-and-mask ↵Hal Finkel2015-03-284-31/+95
| | | | | | | | | | | | | | | instructions The asm syntax for the 32-bit rotate-and-mask instructions can take a 32-bit bitmask instead of an (mb, me) pair. This syntax is not specified in the Power ISA manual, but is accepted by GNU as, and is documented in IBM's Assembler Language Reference. The GNU Multiple Precision Arithmetic Library (gmp) contains assembly that uses this syntax. To implement this, I moved the isRunOfOnes utility function from PPCISelDAGToDAG.cpp to PPCMCTargetDesc.h. llvm-svn: 233483
* [MCInstPrinter] Enable MCInstPrinter to change its behavior based on theAkira Hatanaka2015-03-272-2/+3
| | | | | | | | | | | | | | | | | | | | per-function subtarget. Currently, code-gen passes the default or generic subtarget to the constructors of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which enables some targets (AArch64, ARM, and X86) to change their instprinter's behavior based on the subtarget feature bits. Since the backend can now use different subtargets for each function, instprinter has to be changed to use the per-function subtarget rather than the default subtarget. This patch takes the first step towards enabling instprinter to change its behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the various print methods table-gen auto-generates. I will follow up with changes to instprinters of AArch64, ARM, and X86. llvm-svn: 233411
* Remove superfluous .str() and replace std::string concatenation with Twine.Yaron Keren2015-03-271-1/+1
| | | | llvm-svn: 233392
* Add computeFSAdditions to the function based subtarget creationEric Christopher2015-03-261-1/+9
| | | | | | | | | for PPC due to some unfortunate default setting via TargetMachine creation. I've added a FIXME on how this can be unraveled in the backend and a test to make sure we successfully legalize 64-bit things if we say we're 64-bits. llvm-svn: 233239
* Add Hardware Transactional Memory (HTM) SupportKit Barton2015-03-2516-32/+360
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07 (POWER8). The intrinsic support is based on GCC one [1], but currently only the 'PowerPC HTM Low Level Built-in Function' are implemented. The HTM instructions follows the RC ones and the transaction initiation result is set on RC0 (with exception of tcheck). Currently approach is to create a register copy from CR0 to GPR and comapring. Although this is suboptimal, since the branch could be taken directly by comparing the CR0 value, it generates code correctly on both test and branch and just return value. A possible future optimization could be elimitate the MFCR instruction to branch directly. The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on powerpc64 and powerpc64le. This is send along a clang patch to enabled the builtins and option switch. [1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html Phabricator Review: http://reviews.llvm.org/D8247 llvm-svn: 233204
* [APInt] Add an isSplat helper and use it in some places.Benjamin Kramer2015-03-251-11/+4
| | | | | | | To complement getSplat. This is more general than the binary decomposition method as it also handles non-pow2 splat sizes. llvm-svn: 233195
* Disabling warnings for MSVC build to enable /W4 use.Andrew Kaylor2015-03-241-2/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D8572 llvm-svn: 233133
* Revert "Use std::bitset for SubtargetFeatures"Michael Kuperstein2015-03-241-1/+1
| | | | | | | | This reverts commit r233055. It still causes buildbot failures (gcc running out of memory on several platforms, and a self-host failure on arm), although less than the previous time. llvm-svn: 233068
* Use std::bitset for SubtargetFeaturesMichael Kuperstein2015-03-241-1/+1
| | | | | | | | | | | | | Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. The first time this was committed (r229831), it caused several buildbot failures. At least some of the ARM ones were due to gcc/binutils issues, and should now be fixed. Differential Revision: http://reviews.llvm.org/D8542 llvm-svn: 233055
* Remove the bare getSubtargetImpl call from the PPC port. As partEric Christopher2015-03-212-5/+1
| | | | | | | of this add a test that shows we can generate code with for functions that differ by subtarget feature. llvm-svn: 232882
* [ARM] Fix handling of thumb1 out-of-range frame offsetsJohn Brawn2015-03-202-2/+3
| | | | | | | | | | | | | | | | LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its answer when the base register changes. Unfortunately this isn't true in thumb1, where SP-based loads allow a larger offset than non-SP-based loads, and this causes the base register reuse code to generate instructions that are unencodable, causing an assertion failure. Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which ARMBaseRegisterInfo can then make use of to give the correct answer. Differential Revision: http://reviews.llvm.org/D8419 llvm-svn: 232825
* Split the object streamer callback in one per file format.Rafael Espindola2015-03-191-18/+11
| | | | | | | | | | | | | There are two main advantages to doing this * Targets that only need to handle one of the formats specially don't have to worry about the others. For example, x86 now only registers a constructor for the COFF streamer. * Changes to the arguments passed to one format constructor will not impact the other formats. llvm-svn: 232699
* two or more, use a for.Rafael Espindola2015-03-181-58/+30
| | | | llvm-svn: 232688
* [PowerPC] Correct typo in PPCInstrAltivec.tdBill Schmidt2015-03-181-1/+1
| | | | llvm-svn: 232681
* Centralize the handling of unique ids for temporary labels.Rafael Espindola2015-03-171-3/+2
| | | | | | | | | | | | | | | | Before this patch code wanting to create temporary labels for a given entity (function, cu, exception range, etc) had to keep its own counter to have stable symbol names. createTempSymbol would still add a suffix to make sure a new symbol was always returned, but it kept a single counter. Because of that, if we were to use just createTempSymbol("cu_begin"), the label could change from cu_begin42 to cu_begin43 because some other code started using temporary labels. Simplify this by just keeping one counter per prefix and removing the various specialized counters. llvm-svn: 232535
* Add assertion to detect invalid registers in the PowerPC MC instruction ↵Samuel Antao2015-03-171-0/+3
| | | | | | | | lowering. We have observed that noreg was being generated due to a bug in FastIsel and was not being detected during emission. It happens that in the Asm emission there is an assertion that detects this in getRegisterName() from the tbl-generated file PPCGenAsmWriter.inc. However, when emitting an Obj file, invalid registers can be emitted given that no check are made in getBinaryCodeFromInstr() from PPCGenMCCodeEmitter.inc. In order to cover all cases this adds an assertion for reg operands in LowerPPCMachineInstrToMCInst. llvm-svn: 232525
OpenPOWER on IntegriCloud