summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
...
* Use the 'arcp' fast-math-flag when combining repeated FP divisorsSanjay Patel2015-10-272-15/+70
| | | | | | | | | | | | This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes. This was originally part of D8900. Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and possibly other changes. Differential Revision: http://reviews.llvm.org/D9708 llvm-svn: 251450
* [ScalarEvolutionExpander] PHI on a catchpad can be used on both edgesDavid Majnemer2015-10-272-11/+47
| | | | | | | | A PHI on a catchpad might be used by both edges out of the catchpad, feeding back into a loop. In this case, just use the insertion point. Anything more clever would require new basic blocks or PHI placement. llvm-svn: 251442
* [AArch64]Merge halfword loads into a 32-bit loadJun Bum Lim2015-10-272-45/+265
| | | | | | | | | | | | | | | | This recommits r250719, which caused a failure in SPEC2000.gcc because of the incorrect insert point for the new wider load. Convert two halfword loads into a single 32-bit word load with bitfield extract instructions. For example : ldrh w0, [x2] ldrh w1, [x2, #2] becomes ldr w0, [x2] ubfx w1, w0, #16, #16 and w0, w0, #ffff llvm-svn: 251438
* Whitespace.NAKAMURA Takumi2015-10-271-1/+1
| | | | llvm-svn: 251437
* Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"NAKAMURA Takumi2015-10-273-141/+89
| | | | | | | It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp. See also PR25324. llvm-svn: 251436
* Tidy a comment. NFC.Diego Novillo2015-10-271-1/+1
| | | | llvm-svn: 251434
* Create a new interface addSuccessorWithoutWeight(MBB*) in MBB to add ↵Cong Hou2015-10-277-30/+48
| | | | | | | | | | | | | | successors when optimization is disabled. When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights. We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled. In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list. Differential revision: http://reviews.llvm.org/D13963 llvm-svn: 251429
* [SLP] Be more aggressive about reduction width selection.Charlie Turner2015-10-272-12/+158
| | | | | | | | | | | | | | | | | | | Summary: This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach. It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize. I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT. The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something! Reviewers: nadav, jmolloy Subscribers: mssimpso, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D14116 llvm-svn: 251428
* [SLP] Try a bit harder to find reduction PHIsCharlie Turner2015-10-272-5/+117
| | | | | | | | | | | | | | | Summary: Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads. There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT. Reviewers: jmolloy, mcrosier, nadav Subscribers: mssimpso, nadav, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14063 llvm-svn: 251425
* [SLP] Treat SelectInsts as reduction values.Charlie Turner2015-10-272-6/+80
| | | | | | | | | | | | | | | Summary: Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values. The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here. Reviewers: jmolloy, mzolotukhin, spatel, nadav Subscribers: nadav, mssimpso, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D13949 llvm-svn: 251424
* [Orc] Fix indentation.Lang Hames2015-10-271-35/+35
| | | | llvm-svn: 251423
* Fix SamplePGO segfault when debug info is missing.Diego Novillo2015-10-273-2/+45
| | | | | | | | | | When emitting a remark for a conditional branch annotation, the remark uses the line location information of the conditional branch in the message. In some cases, that information is unavailable and the optimization would segfaul. I'm still not sure whether this is a bug or WAI, but the optimizer should not die because of this. llvm-svn: 251420
* [ms-inline-asm] Leave alignment in bytes if the native assembler uses bytesReid Kleckner2015-10-271-3/+9
| | | | | | | | | | | | | The existing behavior was correct on Darwin, which is probably the platform it was written for. Before this change, we would rewrite "align 8" to ".align 3" and then fail to make it through the integrated assembler because 3 is not a power of 2. Differential Revision: http://reviews.llvm.org/D14120 llvm-svn: 251418
* Rename qsort -> multikey_qsort. NFC.Rui Ueyama2015-10-271-4/+4
| | | | | | `qsort` as a file-scope local function name was confusing. llvm-svn: 251414
* Prefer ranlib mode over ar mode.Ed Schouten2015-10-271-2/+2
| | | | | | | | | | | | | | | | | For CloudABI's toolchain I have a symlink that goes from <target>-ar and <target>-ranlib to LLVM's ar binary, to mimick GNU Binutils' naming scheme. The problem is that if we're targetting ARM64, the name of the ranlib executable is aarch64-unknown-cloudabi-ranlib. This already contains the string "ar". Let's move the "ranlib" test above the "ar" test. It's not that likely that we're going to see operating systems or harwdare architectures that are called "ranlib". Reviewed by: rafael Differential Revision: http://reviews.llvm.org/D14123 llvm-svn: 251413
* [CMake] Get rid of LLVM_DYLIB_EXPORT_ALL, and make it the default, add ↵Chris Bieneman2015-10-272-42/+47
| | | | | | | | | | | | | | | | | | | libLLVM-C on darwin to cover the C API needs. Summary: We've had a lot of discussion in the past about the meaningful and useful default behaviors for the llvm-shlib tool. The original implementation was heavily geared toward Apple's use, and I think that was wrong. This patch seeks to correct that. I've removed the LLVM_DYLIB_EXPORT_ALL variable and made libLLVM export everything by default. I've also added a new target that is only built on Darwin for libLLVM-C as a library that re-exports the LLVM-C API. This library is not built on Linux because ELF doesn't support re-export libraries in the same way MachO does. Reviewers: chapuni, resistor, bogner, axw Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13842 llvm-svn: 251411
* [X86][AVX512] [X86][AVX512] add convert float to halfAsaf Badouh2015-10-277-22/+230
| | | | | | | | convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem). Differential Revision: http://reviews.llvm.org/D14113 llvm-svn: 251409
* [ARM] Expand ROTL and ROTR of vector value typesCharlie Turner2015-10-274-1/+37
| | | | | | | | | | | | Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection. Reviewers: rengolin, t.p.northover Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14082 llvm-svn: 251401
* Do not use "else" when both branches return (NFC)Mehdi Amini2015-10-271-2/+1
| | | | | From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 251398
* [ScalarEvolutionExpander] Properly insert no-op casts + EH PadsDavid Majnemer2015-10-272-15/+188
| | | | | | | | | | | We want to insert no-op casts as close as possible to the def. This is tricky when the cast is of a PHI node and the BasicBlocks between the def and the use cannot hold any instructions. Iteratively walk EH pads until we hit a non-EH pad. This fixes PR25326. llvm-svn: 251393
* [X86] Make elfiamcu an OS, not an environment.Michael Kuperstein2015-10-275-12/+12
| | | | | | | | | | GNU tools require elfiamcu to take up the entire OS field, so, e.g. i?86-*-linux-elfiamcu is not considered a legal triple. Make us compatible. Differential Revision: http://reviews.llvm.org/D14081 llvm-svn: 251390
* [SimplifyLibCalls] Use range-based loop. No functional change.Davide Italiano2015-10-271-4/+2
| | | | llvm-svn: 251383
* Convert cost table lookup functions to return a pointer to the entry or ↵Craig Topper2015-10-274-168/+142
| | | | | | | | | | nullptr instead of the index. This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer. While there make use of ArrayRef and std::find_if. llvm-svn: 251382
* [function-attrs] Refactor code to handle shorter code with early exits.Chandler Carruth2015-10-271-31/+37
| | | | | | | | | | | No functionality changed here, but the indentation is substantially reduced and IMO the code is much easier to read. I've also added some helpful comments. This is just a clean-up I wrote while studying the code, and that has been in my backlog for a while. llvm-svn: 251381
* [ValueTracking] Don't special case wrapped ConstantRanges; NFCISanjoy Das2015-10-271-3/+1
| | | | | | | | | | | | | | | | | | | Use `getUnsignedMax` directly instead of special casing a wrapped ConstantRange. The previous code would have been "buggy" (and this would have been a semantic change) if LLVM allowed !range metadata to denote full ranges. E.g. in %val = load i1, i1* %ptr, !range !{i1 1, i1 1} ;; == full set ValueTracking would conclude that the high bit (IOW the only bit) in %val was zero. Since !range metadata does not allow empty or full ranges, this change is just a minor stylistic improvement. llvm-svn: 251380
* [x86] replace integer logic ops with packed SSE FP logic opsSanjay Patel2015-10-272-20/+38
| | | | | | | | | | | | | | | | | | | If we have an operand to a bitwise logic op that's already in an XMM register and the result is going to be sent to an XMM register, then use an SSE logic op to avoid moves between the integer and vector register files. Related commits: http://reviews.llvm.org/rL248395 http://reviews.llvm.org/rL248399 http://reviews.llvm.org/rL248404 http://reviews.llvm.org/rL248409 http://reviews.llvm.org/rL248415 This should solve PR22428: https://llvm.org/bugs/show_bug.cgi?id=22428 llvm-svn: 251378
* [SCEV] Refactor out ScalarEvolution::getDataLayout; NFCSanjoy Das2015-10-272-17/+19
| | | | llvm-svn: 251375
* Fix llc crash processing S/UREM for -Oz builds caused by rL250825.Steve King2015-10-272-5/+278
| | | | | | | | | | | | | | When taking the remainder of a value divided by a constant, visitREM() attempts to convert the REM to a longer but faster sequence of instructions. This conversion calls combine() on a speculative DIV instruction. Commit rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes. Flow eventually hits unreachable(). This patch adds a test case and a check to prevent visitREM() from trying to convert the REM instruction in cases where a DIVREM is possible. See http://reviews.llvm.org/D14035 llvm-svn: 251373
* add FP logic test cases to show current codegen (PR22428)Sanjay Patel2015-10-261-0/+60
| | | | llvm-svn: 251370
* [mips][ias] Fold needsExpansion() and expandInstruction() together. NFC.Daniel Sanders2015-10-261-122/+83
| | | | | | | | | | | | | | Summary: Previously we maintained two separate switch statements that had to be kept in sync. This patch merges them into a single switch. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D14012 llvm-svn: 251369
* Switch ownership of miscellaneous ARM target to myself.Tim Northover2015-10-261-2/+2
| | | | llvm-svn: 251367
* [x86] Make the vselect-minmax test 2x to 3x faster by deleting all theChandler Carruth2015-10-261-4032/+960
| | | | | | | instructions that aren't relevant for instruction selection of vector min and max. llvm-svn: 251366
* Use Twin instead of std::to_string.Oleksiy Vyalov2015-10-261-4/+3
| | | | | | http://reviews.llvm.org/D14095 llvm-svn: 251365
* Fix indents. It's a follow up to r251353.Ivan Krasin2015-10-261-2/+2
| | | | llvm-svn: 251364
* [LLVMSymbolize] Don't use LLVMSymbolizer::Options in ModuleInfo. NFC.Alexey Samsonov2015-10-262-20/+21
| | | | | | | | | LLVMSymbolizer::Options is mostly used in LLVMSymbolizer class anyway. Let's keep their usage restricted to that class, especially given that it's worth to move ModuleInfo to a different header, independent from the symbolizer class. llvm-svn: 251363
* reorganize logic; NFCI (retry r251349)Sanjay Patel2015-10-261-13/+13
| | | | | | | | | This is a preliminary step before adding another optimization to PerformBITCASTCombine(). ..and I really hope it's NFC this time! llvm-svn: 251357
* Move imported entities into DwarfCompilationUnit to speed up LTO linking.Ivan Krasin2015-10-264-22/+14
| | | | | | | | | | | | | | | | Summary: In particular, this CL speeds up the official Chrome linking with LTO by 1.8x. See more details in https://crbug.com/542426 Reviewers: dblaikie Subscribers: jevinskie Differential Revision: http://reviews.llvm.org/D13918 llvm-svn: 251353
* ARM: make sure VFP loads and stores are properly aligned.Tim Northover2015-10-262-10/+110
| | | | | | | Both VLDRS and VLDRD fault if the memory is not 4 byte aligned, which wasn't really being checked before, leading to faults at runtime. llvm-svn: 251352
* revert r251349; it included code for a functional changeSanjay Patel2015-10-261-33/+14
| | | | llvm-svn: 251350
* reorganize logic; NFCISanjay Patel2015-10-261-14/+33
| | | | | | | This is a preliminary step before adding another optimization to PerformBITCASTCombine(). llvm-svn: 251349
* Initialize BasicAAWrapperPass in it's constructorKeno Fischer2015-10-262-1/+5
| | | | | | | | | | | | Summary: This idiom is used elsewhere in LLVM, but was overlooked here. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13628 llvm-svn: 251348
* Fix build failure on GCC 4.7 (old libstdc++ doesn't have std::map::emplace).Alexey Samsonov2015-10-261-2/+3
| | | | llvm-svn: 251347
* Remove use of std::map<>::emplace which is not supported on some older ↵David Blaikie2015-10-261-1/+1
| | | | | | versions of libstdc++ llvm-svn: 251346
* Remove unused local variable. NFC.Diego Novillo2015-10-261-2/+0
| | | | llvm-svn: 251344
* Fix tests.Peter Collingbourne2015-10-261-1/+1
| | | | llvm-svn: 251343
* ARM/ELF: Restore original (pre-r251322) logic for deciding whether to use GOT.Peter Collingbourne2015-10-262-2/+2
| | | | | | | Unbreaks linking with gold, which cannot resolve direct relocations referring to global symbols. llvm-svn: 251342
* [LLVMSymbolize] Use symbol table only if function linkage name was requested.Alexey Samsonov2015-10-262-3/+5
| | | | | | | Now it's enough to just specify -functions=short without additionally providing -use-symbol-table=false. llvm-svn: 251339
* Fix build error by fully qualifying llvm::make_unique.Alexey Samsonov2015-10-261-1/+1
| | | | llvm-svn: 251338
* Optimize StringTableBuilder.Rui Ueyama2015-10-261-14/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a patch to improve StringTableBuilder's performance. That class' finalize function is very hot particularly in LLD because the function does tail-merge strings in string tables or SHF_MERGE sections. Generic std::sort-style sorter is not efficient for sorting strings. The function implemented in this patch seems to be more efficient. Here's a benchmark of LLD to link Clang with or without this patch. The numbers are medians of 50 runs. -O0 real 0m0.455s real 0m0.430s (5.5% faster) -O3 real 0m0.487s real 0m0.452s (7.2% faster) Since that is a benchmark of the whole linker, the speedup of StringTableBuilder itself is much more than that. http://reviews.llvm.org/D14053 llvm-svn: 251337
* [LLVMSymbolize] Use std::unique_ptr more extensively to clarify ownership.Alexey Samsonov2015-10-262-14/+14
| | | | llvm-svn: 251336
OpenPOWER on IntegriCloud