summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [LPM] Fix PR18616 where the shifts to the loop pass manager to extractChandler Carruth2014-01-282-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | LCSSA from it caused a crasher with the LoopUnroll pass. This crasher is really nasty. We destroy LCSSA form in a suprising way. When unrolling a loop into an outer loop, we not only need to restore LCSSA form for the outer loop, but for all children of the outer loop. This is somewhat obvious in retrospect, but hey! While this seems pretty heavy-handed, it's not that bad. Fundamentally, we only do this when we unroll a loop, which is already a heavyweight operation. We're unrolling all of these hypothetical inner loops as well, so their size and complexity is already on the critical path. This is just adding another pass over them to re-canonicalize. I have a test case from PR18616 that is great for reproducing this, but pretty useless to check in as it relies on many 10s of nested empty loops that get unrolled and deleted in just the right order. =/ What's worse is that investigating this has exposed another source of failure that is likely to be even harder to test. I'll try to come up with test cases for these fixes, but I want to get the fixes into the tree first as they're causing crashes in the wild. llvm-svn: 200273
* [TLI] Add a new hook to TargetLowering to query the target if a load of a ↵Juergen Ributzka2014-01-286-11/+40
| | | | | | | | | | | | | | | | | constant should be converted to simply the constant itself. Before this patch we used getIntImmCost from TargetTransformInfo to determine if a load of a constant should be converted to just a constant, but the threshold for this was set to an arbitrary value. This value works well for the two targets (X86 and ARM) that implement this target-hook, but it isn't target-independent at all. Now targets have the possibility to decide directly if this optimization should be performed. The default value is set to false to preserve the current behavior. The target hook has been moved to TargetLowering, which removed the last use and need of TargetTransformInfo in SelectionDAG. llvm-svn: 200271
* LoopVectorize: Support conditional stores by scalarizingArnold Schwaighofer2014-01-281-29/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vectorizer takes a loop like this and widens all instructions except for the store. The stores are scalarized/unrolled and hidden behind an "if" block. for (i = 0; i < 128; ++i) { if (a[i] < 10) a[i] += val; } for (i = 0; i < 128; i+=2) { v = a[i:i+1]; v0 = (extract v, 0) + 10; v1 = (extract v, 1) + 10; if (v0 < 10) a[i] = v0; if (v1 < 10) a[i] = v1; } The vectorizer relies on subsequent optimizations to sink instructions into the conditional block where they are anticipated. The flag "vectorize-num-stores-pred" controls whether and how many stores to handle this way. Vectorization of conditional stores is disabled per default for now. This patch also adds a change to the heuristic when the flag "enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small loops until load/store ports are saturated. This heuristic uses TTI's getMaxUnrollFactor as a measure for load/store ports. I also added a second flag -enable-cond-stores-vec. It will enable vectorization of conditional stores. But there is no cost model for vectorization of conditional stores in place yet so this will not do good at the moment. rdar://15892953 Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll -vectorize-num-stores-pred=1 (before the BFI change): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower) Applications/siod/siod 2.18% Performance improvements: mesa -4.42% libquantum -4.15% With a patch that slightly changes the register heuristics (by subtracting the induction variable on both sides of the register pressure equation, as the induction variable is probably not really unrolled): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.73% Applications/siod/siod 1.97% Performance Improvements: libquantum -13.05% (we now also unroll quantum_toffoli) mesa -4.27% llvm-svn: 200270
* Revert r199871 and replace it with a simple check in the debug infoEric Christopher2014-01-2813-28/+38
| | | | | | | | | code to see if we're emitting a function into a non-default text section. This is still a less-than-ideal solution, but more contained than r199871 to determine whether or not we're emitting code into an array of comdat sections. llvm-svn: 200269
* Reformat slightly.Eric Christopher2014-01-271-5/+1
| | | | llvm-svn: 200264
* PGO branch weight: keep halving the weights until they can fit intoManman Ren2014-01-271-12/+13
| | | | | | | | | | uint32. When folding branches to common destination, the updated branch weights can exceed uint32 by more than factor of 2. We should keep halving the weights until they can fit into uint32. llvm-svn: 200262
* Fix the "#ifndef HAVE_SYS_WAIT_H" code path in Program.inc to compileMark Seaborn2014-01-271-0/+1
| | | | | | Without this fix, WaitResult is not defined. llvm-svn: 200259
* ARM MC: Fix the initial DWARF CFI unwind info at the start of a functionMark Seaborn2014-01-271-2/+8
| | | | | | | | | | | | | | | | | | | | This brings MC into line with GNU 'as' on ARM, and it brings the ARM target into line with most other LLVM targets, which declare the initial CFI state with addInitialFrameState(). Without this, functions generated with .cfi_startproc/endproc on ARM will tend to cause GDB to abort with: gdb/dwarf2-frame.c:1132: internal-error: Unknown CFA rule. I've also tested this by comparing the output of "readelf -w" on the object files produced by llvm-mc and gas when given the .s file added here. This change is part of addressing PR18636. Differential Revision: http://llvm-reviews.chandlerc.com/D2597 llvm-svn: 200255
* Fix sext(setcc) -> select_cc using wrong type for setcc.Matt Arsenault2014-01-271-10/+16
| | | | | | | | | | | | | | | | Also update the comment, since it actually produces a select (setcc) instead of select_cc. It was checking and using the setcc result type for the type of the sext, instead of the type of the compared items. In my problem case, the sext was to i32 and was used as the setcc type, but the expected type was i64. No test since I haven't been able to hit the problem with this on any in-tree targets. llvm-svn: 200249
* Fix unsupported addressing mode assertion for pldDavid Peixotto2014-01-272-2/+6
| | | | | | | | | | | | | | | | | | | Summary: This commit gives an address mode to the PLD instruction. We were getting an assertion failure in the frame lowering code because we had code that was doing a pld of a stack allocated address. The frame lowering was checking the address mode and then asserting because pld had none defined. This commit fixes pld for arm mode. There was a previous fix for thumb mode in a separate commit. The commit for thumb mode added a test in a separate file because it would otherwise fail for arm. This commit moves the thumb test back into the prefetch.ll file and adds the corresponding arm test. Differential Revision: http://llvm-reviews.chandlerc.com/D2622 llvm-svn: 200248
* test commit: add minor commentGautam Chakrabarti2014-01-271-1/+1
| | | | llvm-svn: 200244
* [DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.Andrea Di Biagio2014-01-271-9/+64
| | | | | | | | | | | | | This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when the operand in input is a build vector of constants (or UNDEFs). The inability to fold a sext/zext of a constant build_vector was the root cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support. Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a ConstantSDNode. llvm-svn: 200234
* MC: Add support for .cfi_startproc simpleDavid Majnemer2014-01-275-17/+34
| | | | | | | | | | | | | This commit allows LLVM MC to process .cfi_startproc directives when they are followed by an additional `simple' identifier. This signals to elide the emission of target specific CFI instructions that would normally occur initially. This fixes PR16587. Differential Revision: http://llvm-reviews.chandlerc.com/D2624 llvm-svn: 200227
* [vectorize] Initial version of respecting PGO in the vectorizer: treatChandler Carruth2014-01-271-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cold loops as-if they were being optimized for size. Nothing fancy here. Simply test case included. The nice thing is that we can now incrementally build on top of this to drive other heuristics. All of the infrastructure work is done to get the profile information into this layer. The remaining work necessary to make this a fully general purpose loop unroller for very hot loops is to make it a fully general purpose loop unroller. Things I know of but am not going to have time to benchmark and fix in the immediate future: 1) Don't disable the entire pass when the target is lacking vector registers. This really doesn't make any sense any more. 2) Teach the unroller at least and the vectorizer potentially to handle non-if-converted loops. This is trivial for the unroller but hard for the vectorizer. 3) Compute the relative hotness of the loop and thread that down to the various places that make cost tradeoffs (very likely only the unroller makes sense here, and then only when dealing with loops that are small enough for unrolling to not completely blow out the LSD). I'm still dubious how useful hotness information will be. So far, my experiments show that if we can get the correct logic for determining when unrolling actually helps performance, the code size impact is completely unimportant and we can unroll in all cases. But at least we'll no longer burn code size on cold code. One somewhat unrelated idea that I've had forever but not had time to implement: mark all functions which are only reachable via the global constructors rigging in the module as optsize. This would also decrease the impact of any more aggressive heuristics here on code size. llvm-svn: 200219
* ConstantHoisting: We can't insert instructions directly in front of a PHI node.Benjamin Kramer2014-01-271-3/+18
| | | | | | Insert before the terminating instruction of the dominating block instead. llvm-svn: 200218
* XCore: Fix typo in function name.Benjamin Kramer2014-01-273-10/+10
| | | | llvm-svn: 200216
* [vectorizer] Add an override for the target instruction cost and use itChandler Carruth2014-01-271-0/+11
| | | | | | | to stabilize a test that really is trying to test generic behavior and not a specific target's behavior. llvm-svn: 200215
* [vectorizer] Simplify code to use existing helpers on the FunctionChandler Carruth2014-01-271-8/+8
| | | | | | | | | | object and fewer pointless variables. Also, add a clarifying comment and a FIXME because the code which disables *all* vectorization if we can't use implicit floating point instructions just makes no sense at all. llvm-svn: 200214
* [vectorizer] Teach the loop vectorizer's unroller to only unroll byChandler Carruth2014-01-271-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | powers of two. This is essentially always the correct thing given the impact on alignment, scaling factors that can be used in addressing modes, etc. Also, fix the management of the unroll vs. small loop cost to more accurately model things with this world. Enhance a test case to actually exercise more of the unroll machinery if using synthetic constants rather than a specific target model. Before this change, with the added flags this test will unroll 3 times instead of either 2 or 4 (the two sensible answers). While I don't expect this to make a huge difference, if there are lots of loops sitting right on the edge of hitting the 'small unroll' factor, they might change behavior. However, I've benchmarked moving the small loop cost up and down in many various ways and by a huge factor (2x) without seeing more than 0.2% code size growth. Small adjustments such as the series that led up here have led to about 1% improvement on some benchmarks, but it is very close to the noise floor so I mostly checked that nothing regressed. Let me know if you see bad behavior on other targets but I don't expect this to be a sufficiently dramatic change to trigger anything. llvm-svn: 200213
* [vectorizer] Add some flags which are useful for conducting experimentsChandler Carruth2014-01-271-2/+38
| | | | | | | | | | | | with the unrolling behavior in the loop vectorizer. No functionality changed at this point. These are a bit hack-y, but talking with Hal, there doesn't seem to be a cleaner way to easily experiment with different thresholds here and he was also interested in them so I wanted to commit them. Suggestions for improvement are very welcome here. llvm-svn: 200212
* [vectorizer] Fix a trivial oversight where we always requested theChandler Carruth2014-01-271-4/+4
| | | | | | | | | | | | | | number of vector registers rather than toggling between vector and scalar register number based on VF. I don't have a test case as I spotted this by inspection and on X86 it only makes a difference if your target is lacking SSE and thus has *no* vector registers. If someone wants to add a test case for this for ARM or somewhere else where this is more significant, that would be awesome. Also made the variable name a bit more sensible while I'm here. llvm-svn: 200211
* Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't ↵Nick Lewycky2014-01-271-1/+3
| | | | | | assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else! llvm-svn: 200210
* Teach SCEV to handle more cases of 'and X, CST', specifically where CST is ↵Nick Lewycky2014-01-271-22/+84
| | | | | | | | | | any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits. Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary. Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown. llvm-svn: 200203
* Fix for PR18102.Stepan Dyatkovskiy2014-01-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from mem-ops sequence sorting. Consider, how MergeConsequtiveStores works for next example: store i8 1, a[0] store i8 2, a[1] store i8 3, a[1] ; a[1] again. return ; DAG starts here 1. Method will collect all the 3 stores. 2. It sorts them by distance from the base pointer (farthest with highest index). 3. It takes first consecutive non-overlapping stores and (if possible) replaces them with a single store instruction. The point is, we can't determine here which 'store' instruction would be the second after sorting ('store 2' or 'store 3'). It happens that 'store 3' would be the second, and 'store 2' would be the third. So after merging we have the next result: store i16 (1 | 3 << 8), base ; is a[0] but bit-casted to i16 store i8 2, a[1] So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1]. Fix: In sort routine just also take into account mem-op sequence number. llvm-svn: 200201
* [vectorizer] Clean up the handling of unvectorized loop unrolling in theChandler Carruth2014-01-271-13/+3
| | | | | | | | | | | | | | | | | LoopVectorize pass. The logic here doesn't make much sense. We *only* unrolled if the unvectorized loop was a reduction loop with a single basic block *and* small loop body. The reduction part in particular doesn't make much sense. Instead, if we just fall through to the vectorized unroll logic it makes more sense of unrolling if there is a vectorized reduction that could be hacked on by the SLP vectorizer *or* if the loop is small. This is mostly a cleanup and nothing in the test suite really exercises this, but I did run benchmarks across this change and saw no really significant changes. llvm-svn: 200198
* R600/SI: Add intrinsic for BUFFER_LOAD_DWORD* instructionsMichel Danzer2014-01-273-21/+101
| | | | | Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 200196
* R600/SI: Add intrinsic for S_SENDMSG instructionMichel Danzer2014-01-275-2/+54
| | | | | Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 200195
* Roll back the ConstStringRef change for nowAlp Toker2014-01-271-1/+1
| | | | | | | | | | | There are a couple of interesting things here that we want to check over (particularly the expecting asserts in StringRef) and get right for general use in ADT so hold back on this one. For clang we have a workable templated solution to use in the meanwhile. This reverts commit r200187. llvm-svn: 200194
* Print .mask and .fmask with the target streamer.Rafael Espindola2014-01-274-22/+46
| | | | | | | Testing this also found the missing '\n' after .frame that this patch also fixes. llvm-svn: 200192
* StringRef: Extend constexpr capabilities and introduce ConstStringRefAlp Toker2014-01-271-1/+1
| | | | | | | | | | | | | | | | | | | (1) Add llvm_expect(), an asserting macro that can be evaluated as a constexpr expression as well as a runtime assert or compiler hint in release builds. This technique can be used to construct functions that are both unevaluated and compiled depending on usage. (2) Update StringRef using llvm_expect() to preserve runtime assertions while extending the same checks to static asserts in C++11 builds that support the feature. (3) Introduce ConstStringRef, a strong subclass of StringRef that references compile-time constant strings. It's convertible to, but not from, ordinary StringRef and thus can be used to add compile-time safety to various interfaces in LLVM and clang that only accept fixed inputs such as diagnostic format strings that tend to get misused. llvm-svn: 200187
* Print .frame via the target streamer.Rafael Espindola2014-01-273-5/+21
| | | | llvm-svn: 200186
* [AArch64 NEON] Try to generate CONCAT_VECTOR when lowering BUILD_VECTOR or ↵Kevin Qin2014-01-272-29/+95
| | | | | | | | SHUFFLE_VECTOR. Replace r199791. llvm-svn: 200180
* Revert r199791.Kevin Qin2014-01-272-90/+29
| | | | | | It's old version which has some bugs. I'll commit lattest patch soon. llvm-svn: 200179
* Use SwitchSection in MipsAsmPrinter::EmitStartOfAsmFile.Rafael Espindola2014-01-271-16/+17
| | | | llvm-svn: 200178
* Remove dead code.Rafael Espindola2014-01-272-25/+0
| | | | llvm-svn: 200174
* Add back spaces I missed in the conversion to emitRawComments.Rafael Espindola2014-01-271-3/+3
| | | | | | Sorry about that. llvm-svn: 200171
* Use emitRawComment instead of EmitRawText.Rafael Espindola2014-01-271-4/+5
| | | | llvm-svn: 200170
* Add missing file.Rafael Espindola2014-01-271-0/+27
| | | | llvm-svn: 200169
* Add a XCoreTargetStreamer and port over the simple uses of EmitRawText.Rafael Espindola2014-01-262-9/+64
| | | | llvm-svn: 200167
* ARM: improve diagnostics for .word directiveSaleem Abdulrasool2014-01-261-1/+3
| | | | | | | | | If a complex expression was passed to the .word directive and the first part of the directive failed to parse, a secondary diagnostic would be produced that would clutter the error diagnostics. Improve the diagnostics by consuming the remainder of the statement. llvm-svn: 200160
* AsmParser: improve diagnostics for invalid variantsSaleem Abdulrasool2014-01-261-1/+2
| | | | | | | | | An emitted diagnostic for an invalid relocation variant would place the caret on the token following the relocation variant indicator or at the end of the line if there was no following token. This change corrects the placement of the caret to point to the token. llvm-svn: 200159
* MC: whitespaceSaleem Abdulrasool2014-01-261-57/+56
| | | | | | Fix indentation, remove unnecessary line. NFC. llvm-svn: 200158
* Avoid C++ comment in C sourcesAlp Toker2014-01-261-1/+1
| | | | | | lib/Target/X86/Disassembler/X86DisassemblerDecoder.c:1361:7: error: C++ style comments are not allowed in ISO C90 llvm-svn: 200153
* Follow up of r200095. Code clean up.Evan Cheng2014-01-261-1/+1
| | | | llvm-svn: 200152
* Clean up the Legal/Expand logic for SPARC popc.Jakob Stoklund Olesen2014-01-262-5/+8
| | | | llvm-svn: 200141
* Implement the missing bits corresponding to .mips_hack_elf_flags.Rafael Espindola2014-01-265-86/+47
| | | | | | | | | | | | These were: * noreorder handling on the target object streamer and asm parser. * setting the initial flag bits based on the enabled features. * setting the elf header flag for micromips It is *really* depressing I am the one doing this instead of someone at mips actually taking the time to understand the infrastructure. llvm-svn: 200138
* Pass a MCSubtargetInfo down to the TargetStreamer creation.Rafael Espindola2014-01-269-9/+15
| | | | | | | With this the target streamers will be able to know the target features that are in use. llvm-svn: 200135
* Only generate the popc instruction for SPARC CPUs that implement it.Jakob Stoklund Olesen2014-01-264-6/+13
| | | | | | | The popc instruction is defined in the SPARCv9 instruction set architecture, but it was emulated on CPUs older than Niagara 2. llvm-svn: 200131
* Fix swapped CASA operands.Jakob Stoklund Olesen2014-01-261-2/+2
| | | | | | Found by SingleSource/UnitTests/AtomicOps.c llvm-svn: 200130
* Construct the MCStreamer before constructing the MCTargetStreamer.Rafael Espindola2014-01-2622-92/+111
| | | | | | | | | | This has a few advantages: * Only targets that use a MCTargetStreamer have to worry about it. * There is never a MCTargetStreamer without a MCStreamer, so we can use a reference. * A MCTargetStreamer can talk to the MCStreamer in its constructor. llvm-svn: 200129
OpenPOWER on IntegriCloud