summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* R600: Implement enableClusterLoads()Matt Arsenault2014-07-242-0/+7
| | | | llvm-svn: 213831
* [AArch64] Fix a bug generating incorrect instruction when building small vector.Kevin Qin2014-07-242-38/+70
| | | | | | | | | This bug is introduced by r211144. The element of operand may be smaller than the element of result, but previous commit can only handle the contrary condition. This commit is to handle this scenario and generate optimized codes like ZIP1. llvm-svn: 213830
* [AArch64] Disable some optimization cases for type conversion from sint to ↵Jiangning Liu2014-07-242-4/+30
| | | | | | fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file. llvm-svn: 213827
* Fixed PR20411 - bug in getINSERTPS()Filipe Cabecinhas2014-07-242-0/+32
| | | | | | | | | | When we had a vector_shuffle where we had an input from each vector, we could miscompile it because we were assuming the input from V2 wouldn't be moved from where it was on the vector. Added a test case. llvm-svn: 213826
* IR: Add Value::sortUseList()Duncan P. N. Exon Smith2014-07-243-0/+176
| | | | | | | | | | | Add `Value::sortUseList()`, templated on the comparison function to use. The sort is an iterative merge sort that uses a binomial vector of already-merged lists to limit the size overhead to `O(1)`. This is part of PR5680. llvm-svn: 213824
* Add a VS "14" msbuild toolsetReid Kleckner2014-07-235-0/+75
| | | | | | | | | | | This allows people to try clang inside MSBuild with the VS "14" CTP releases. Fixes PR20341. Patch by Marcel Raad! llvm-svn: 213819
* SimplifyCFG: fix a bug in switch to table conversionManman Ren2014-07-232-4/+54
| | | | | | | | | | | | | | | | | | | We use gep to access the global array "switch.table", and the table index should be treated as unsigned. When the highest bit is 1, this commit zero-extends the index to an integer type with larger size. For a switch on i2, we used to generate: %switch.tableidx = sub i2 %0, -2 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate %switch.tableidx = sub i2 %0, -2 %switch.tableidx.zext = zext i2 %switch.tableidx to i3 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext rdar://17735071 llvm-svn: 213815
* Fix the build when building with only the ARM backend.Rafael Espindola2014-07-231-1/+1
| | | | llvm-svn: 213814
* Document what backwards compatibility we provide for bitcode.Rafael Espindola2014-07-231-0/+23
| | | | llvm-svn: 213813
* Let llvm/test/CodeGen/X86/avx512*-mask-op.ll(s) aware of Win32 x64 calling ↵NAKAMURA Takumi2014-07-233-10/+10
| | | | | | convention. llvm-svn: 213812
* Fix indenting.Eric Christopher2014-07-231-13/+14
| | | | llvm-svn: 213811
* [x86] Rip out some broken test cases for avx512 i1 store support.Chandler Carruth2014-07-231-29/+0
| | | | | | | | | | It isn't reasonable to test storing things using undef pointers -- storing through those is at best "good luck" and really should be transformed to "unreachable". Random changes in the combiner can randomly break these tests for no good reason. I'm following up on the original commit regarding the right long-term strategy here. llvm-svn: 213810
* Reorganize and simplify local variables.Eric Christopher2014-07-231-13/+11
| | | | llvm-svn: 213809
* Finish inverting the MC -> Object dependency.Rafael Espindola2014-07-238-9/+9
| | | | | | | There were still some disassembler bits in lib/MC, but their use of Object was only visible in the includes they used, not in the symbols. llvm-svn: 213808
* [RuntimeDyld][AArch64] Update relocation tests and also add a simple GOT test.Juergen Ributzka2014-07-231-25/+40
| | | | llvm-svn: 213807
* Remove the query for TargetMachine and TargetInstrInfo since we'reEric Christopher2014-07-231-3/+1
| | | | | | already inside TargetInstrInfo. llvm-svn: 213806
* ArgPromo+DebugInfo: Handle updating debug info over multiple applications of ↵David Blaikie2014-07-232-9/+14
| | | | | | | | | | | | | | | | | | | | | | argument promotion. While the subprogram map cache used by Dead Argument Elimination works there, I made a mistake when reusing it for Argument Promotion in r212128 because ArgPromo may transform functions more than once whereas DAE transforms each function only once, removing all the dead arguments in one go. To address this, ensure that the map is updated after each argument promotion. In retrospect it might be a little wasteful to create a map of all subprograms when only handling a single CGSCC, but the alternative is walking the debug info for each function in the CGSCC that gets updated. It's not clear to me what the right tradeoff is there, but since the current tradeoff seems to be working OK (and the code to keep things updated is very cheap), let's stick with that for now. llvm-svn: 213805
* Test debug info in arg promotion with an actual promotion case, rather than ↵David Blaikie2014-07-231-5/+8
| | | | | | | | | a degenerate arg promotion that's actually DAE performed by ArgPromo Also the debug location I had here was bogus, describing the location of the call site as in the callee - and unnecessary, so just drop it. llvm-svn: 213803
* Use an explicit triple in testcase.Jim Grosbach2014-07-231-1/+1
| | | | | | Make the test work better on non-darwin hosts. Hopefully. llvm-svn: 213801
* [X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.Jim Grosbach2014-07-234-6/+44
| | | | | | | | The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800
* X86: restrict combine to when type sizes are safe.Jim Grosbach2014-07-234-9/+51
| | | | | | | | The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799
* DAG: fp->int conversion for non-splat constants.Jim Grosbach2014-07-232-13/+26
| | | | | | | | | | Constant fold the lanes of the input constant build_vector individually so we correctly handle when the vector elements are not all the same constant value. PR20394 llvm-svn: 213798
* [NVPTX] Add some extra tests for mul.wide to test non-power-of-two source typesJustin Holewinski2014-07-231-0/+22
| | | | llvm-svn: 213794
* [NVPTX] Silence a GCC warning found by the buildbotsJustin Holewinski2014-07-231-1/+1
| | | | | | | The cast to NVPTXTargetLowering was missing a 'const', but let's just access the right pointer through the subtarget anyway. llvm-svn: 213793
* Do not add unroll disable metadata after unrolling pass for loops with ↵Mark Heffernan2014-07-232-17/+50
| | | | | | #pragma clang loop unroll(full). llvm-svn: 213789
* [FastISel][AArch64] Fix return type in FastLowerCall.Juergen Ributzka2014-07-232-4/+16
| | | | | | | | | | I used the wrong method to obtain the return type inside FinishCall. This fix simply uses the return type from FastLowerCall, which we already determined to be a valid type. Reduced test case from Chad. Thanks. llvm-svn: 213788
* [NVPTX] mul.wide generation works for any smaller integer source types, not ↵Justin Holewinski2014-07-232-2/+24
| | | | | | just the next smaller power of two llvm-svn: 213784
* [SKX] Added missed test files for rev 213757Robert Khasanov2014-07-234-0/+236
| | | | llvm-svn: 213780
* AsmParser: remove deprecated LLIR supportSaleem Abdulrasool2014-07-234-36/+0
| | | | | | | linker_private and linker_private_weak were deprecated in 3.5. Remove support for them now that the 3.5 branch has been created. llvm-svn: 213777
* ExecutionEngine: remove a stray semicolonSaleem Abdulrasool2014-07-231-1/+1
| | | | | | Detected via GCC 4.8 [-Wpedantic]. llvm-svn: 213776
* [SKX] Fix lowercase "error:" in rev 213757Robert Khasanov2014-07-231-22/+22
| | | | llvm-svn: 213774
* [NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations ↵Justin Holewinski2014-07-232-11/+19
| | | | | | | | | | are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773
* In unroll pragma syntax and loop hint metadata, change "enable" forms to a ↵Mark Heffernan2014-07-234-103/+65
| | | | | | new form using the string "full". llvm-svn: 213772
* test commit: remove trailing spaceAlex Lorenz2014-07-231-5/+5
| | | | llvm-svn: 213770
* [AArch64] Lower sdiv x, pow2 using add + select + shift.Chad Rosier2014-07-235-3/+140
| | | | | | | | | | | | | | | The target-independent DAGcombiner will generate: asr w1, X, #31 w1 = splat sign bit. add X, X, w1, lsr #28 X = X + 0 or pow2-1 asr w0, X, asr #4 w0 = X/pow2 However, the add + shifts is expensive, so generate: add w0, X, 15 w0 = X + pow2-1 cmp X, wzr X - 0 csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X; asr w0, X, asr 4 w0 = X/pow2 llvm-svn: 213758
* [SKX] Enabling mask instructions: encoding, loweringRobert Khasanov2014-07-236-28/+199
| | | | | | | | KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213757
* ARM: spot SBFX-compatbile code expressed with sign_extend_inregTim Northover2014-07-233-2/+39
| | | | | | | | We were assuming all SBFX-like operations would have the shl/asr form, but often when the field being extracted is an i8 or i16, we end up with a SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though. llvm-svn: 213754
* ARM: add patterns for [su]xta[bh] from just a shift.Tim Northover2014-07-234-6/+58
| | | | | | | | Although the final shifter operand is a rotate, this actually only matters for the half-word extends when the amount == 24. Otherwise folding a shift in is just as good. llvm-svn: 213753
* Enable partial libcall inlining for all targets by default.James Molloy2014-07-233-2/+5
| | | | | | | | This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN. This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64. llvm-svn: 213752
* [ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRB ↵Tilmann Scheller2014-07-232-1/+21
| | | | | | | | instructions. The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior. llvm-svn: 213750
* Added release notes for MIPS.Daniel Sanders2014-07-231-1/+69
| | | | llvm-svn: 213749
* AArch64: remove "arm64_be" support in favour of "aarch64_be".Tim Northover2014-07-2319-63/+31
| | | | | | | | | There really is no arm64_be: it was a useful fiction to test big-endian support while both backends existed in parallel, but now the only platform that uses the name (iOS) doesn't have a big-endian variant, let alone one called "arm64_be". llvm-svn: 213748
* [ARM] Make the assembler reject unpredictable pre/post-indexed ARM STR ↵Tilmann Scheller2014-07-232-0/+30
| | | | | | | | instructions. The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior. llvm-svn: 213745
* AArch64: remove arm64 triple enumerator.Tim Northover2014-07-2312-47/+24
| | | | | | | | | | | | Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and invites bugs where only one is checked. In reality, the only legitimate difference between the two (arm64 usually means iOS) is also present in the OS part of the triple and that's what should be checked. We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so there aren't any LLVM-side test changes. llvm-svn: 213743
* Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub ↵Andrea Di Biagio2014-07-232-196/+0
| | | | | | | | | | | | | instructions". This chang fully reverts r211771. That revision added a canonicalization rule which has the potential to causes a combine-cycle in the target-independent canonicalizing DAG combine. The plan is to move the logic that forms target specific addsub nodes as part of the lowering of shuffles. llvm-svn: 213736
* [x86] Clean up a test case to use check labels and spell out the exactChandler Carruth2014-07-231-74/+116
| | | | | | | | | | instruction sequences with CHECK-NEXT for these test cases. This notably exposes how absolutely horrible the generated code is for several of these test cases, and will make any future updates to the test as our vector instruction selection gets better. llvm-svn: 213732
* [ARM] Add regression test for the earlyclobber constraint of ARM STRB.Tilmann Scheller2014-07-231-0/+10
| | | | | | The constraint was added in r213369. llvm-svn: 213730
* [ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions.Tilmann Scheller2014-07-232-3/+15
| | | | | | | | The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. llvm-svn: 213729
* [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicateChandler Carruth2014-07-2316-273/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727
* We may visit a call that uses an alloca multiple times in ↵Nick Lewycky2014-07-232-5/+17
| | | | | | callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405! llvm-svn: 213726
OpenPOWER on IntegriCloud