summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly] Update test expectationsDerek Schuff2015-12-121-108/+49
| | | | | | | | | Many tests are now passing due to eliminateFrameIndex implementation and the list needs to be re-triaged because it unblocks other failures, and some previous failures are different. However I'm about to churn it more by implementing more lowering, so will wait on that. llvm-svn: 255396
* Revert rL255391: [X86ISelLowering] Add additional support for ↵Chen Li2015-12-122-71/+3
| | | | | | | | multiplication-to-shift conversion. because it broke buildbot. llvm-svn: 255395
* use FileCheck for better checkingSanjay Patel2015-12-121-3/+22
| | | | llvm-svn: 255394
* [WebAssembly] Implement prolog/epilog insertion and FrameIndex eliminationDerek Schuff2015-12-1112-19/+1303
| | | | | | | | | | | | | | | | | | Summary: Use the SP32 physical register as the base for FrameIndex lowering. Update it and the __stack_pointer global var in the prolog and epilog. Extend the mapping of virtual registers to wasm locals to include the physical registers. Rather than modify the target-independent PrologEpilogInserter (which asserts that there are no virtual registers left) include a slightly-modified copy for Wasm that does not have this assertion and only clears the virtual registers if scavenging was needed (which of course it isn't for wasm). Differential Revision: http://reviews.llvm.org/D15344 llvm-svn: 255392
* [X86ISelLowering] Add additional support for multiplication-to-shift conversion.Chen Li2015-12-112-3/+71
| | | | | | | | | | | | Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this. Reviewers: craig.topper, RKSimon Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14603 llvm-svn: 255391
* SamplePGO - Reduce memory utilization by 10x.Diego Novillo2015-12-113-61/+8
| | | | | | | | | | | | | | | DenseMap is the wrong data structure to use for sample records and call sites. The keys are too large, causing massive core memory growth when reading profiles. Before this patch, a 21Mb input profile was causing the compiler to grow to 3Gb in memory. By switching to std::map, the compiler now grows to 300Mb in memory. There still are some opportunities for memory footprint reduction. I'll be looking at those next. llvm-svn: 255389
* SelectionDAG: Match min/max if the scalar operation is legalMatt Arsenault2015-12-117-87/+361
| | | | llvm-svn: 255388
* Revert r248483, r242546, r242545, and r242409 - absdiff intrinsicsHal Finkel2015-12-1116-399/+35
| | | | | | | | | | | | | | | | | | | | After much discussion, ending here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html it has been decided that, instead of having the vectorizer directly generate special absdiff and horizontal-add intrinsics, we'll recognize the relevant reduction patterns during CodeGen. Accordingly, these intrinsics are not needed (the operations they represent can be pattern matched, as is already done in some backends). Thus, we're backing these out in favor of the current development work. r248483 - Codegen: Fix llvm.*absdiff semantic. r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation llvm-svn: 255387
* Avoid buffered reads of /dev/urandomRafael Espindola2015-12-111-4/+9
| | | | | | | | | | | | | | I am seeing disappointing clang performance on a large PowerPC64 Linux box. GetRandomNumberSeed() does a buffered read from /dev/urandom to seed its PRNG. As a result we read an entire page even though we only need 4 bytes. With every clang task reading a page worth of /dev/urandom we end up spending a large amount of time stuck on kernel spinlock. Patch by Anton Blanchard! llvm-svn: 255386
* [llvm-objdump/MachODump] Reduce code duplication.Davide Italiano2015-12-111-69/+41
| | | | llvm-svn: 255380
* Add tests for bitcast-bitcast sequences for all scalar/vector permutationsSanjay Patel2015-12-111-0/+90
| | | | | | As noted in http://reviews.llvm.org/D15392 , we should be able to improve this. llvm-svn: 255370
* [PGO] Revert r255365: solution incomplete, not handling lambda yetXinliang David Li2015-12-116-16/+8
| | | | llvm-svn: 255369
* [PGO] Stop using invalid char in instr variable names.Xinliang David Li2015-12-116-8/+16
| | | | | | | | | | | | | | Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This patch fixed the issue. With the change, the index format version will be bumped up by 1. Backward compatibility is preserved with this change. Differential Revision: http://reviews.llvm.org/D15243 llvm-svn: 255365
* CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()Matthias Braun2015-12-1110-150/+94
| | | | | | | | | | | | | | | | | | | | computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362
* Start replacing vector_extract/vector_insert with extractelt/inserteltMatt Arsenault2015-12-1112-102/+103
| | | | | | | | | | | | | | | | | | | | These are redundant pairs of nodes defined for INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT. insertelement/extractelement are slightly closer to the corresponding C++ node name, and has stricter type checking so prefer it. Update targets to only use these nodes where it is trivial to do so. AArch64, ARM, and Mips all have various type errors on simple replacement, so they will need work to fix. Example from AArch64: def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8), (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>; Which is trying to do sext_inreg i8, i8. llvm-svn: 255359
* [WebAssembly] Fix ADJCALLSTACKDOWN/UP use/defsDerek Schuff2015-12-112-5/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use and def the stack pointer. Since they do not, all the nodes are being eliminated by DeadMachineInstructionElim, so they aren't in the IR when PrologEpilogInserter/eliminateCallFramePseudo needs them. This change fixes that, but since RegStackify will not stackify across them (and it runs early, before PEI), change LowerCall to only emit them when the call frame size is > 0. That makes the current code work the same way and makes code handled by D15344 also work the same way. We can expand the condition beyond NumBytes > 0 in the future if needed. Reviewers: sunfish, jfb Subscribers: jfb, dschuff, llvm-commits Differential Revision: http://reviews.llvm.org/D15459 llvm-svn: 255356
* Revert r255247, r255265, and r255286 due to serious compile-time regressions.Chad Rosier2015-12-116-393/+93
| | | | | | | | Revert "[DSE] Disable non-local DSE to see if the bots go green." Revert "[DeadStoreElimination] Use range-based loops. NFC." Revert "[DeadStoreElimination] Add support for non-local DSE." llvm-svn: 255354
* CXX_FAST_TLS calling convention: target independent portion.Manman Ren2015-12-114-1/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given calling convention and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15340 llvm-svn: 255353
* fix typos; NFCSanjay Patel2015-12-111-3/+3
| | | | llvm-svn: 255352
* [dsymutil] Ignore absolute symbols in the debug mapFrederic Riss2015-12-114-3/+30
| | | | | | | | | | | | Quoting from the comment added to the code: // Objective-C on i386 uses artificial absolute symbols to // perform some link time checks. Those symbols have a fixed 0 // address that might conflict with real symbols in the object // file. As I cannot see a way for absolute symbols to find // their way into the debug information, let's just ignore those. llvm-svn: 255350
* AlignmentFromAssumptions and SLPVectorizer preserves AA and GlobalsAAHal Finkel2015-12-112-0/+7
| | | | | | | | | | | | GlobalsAA's assumptions that passes do not escape globals not previously escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline. http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html Patch by Vaivaswatha Nagaraj! llvm-svn: 255348
* [TableGen] Correct Namespace lookup with AltNames in AsmWriterEmitterHal Finkel2015-12-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | AsmWriterEmitter will generate a getRegisterName function with an alternate register name index as its second argument if the target makes use of them. The enum of these values is generated in RegisterInfoEmitter. The getRegisterName generator would assume the namespace could always be found by reading index 1 of the list of AltNameIndices, but this will fail if this list is sorted such that the NoRegAltName is at index 1. Because this list is sorted by record name (in CodeGenTarget::ReadRegAltNameIndices), you only run in to problems if your MyTargetRegisterInfo.td defines a single RegAltNameIndex that sorts lexically before NoRegAltName. For example, if a target has something like def AnAltNameIndex : RegAltNameIndex and defines RegAltNameIndices for some registers then, prior to this change, AsmWriterEmitter would generate references to ::AnAltNameIndex and ::NoRegAltName Patch by Alex Bradbury! llvm-svn: 255344
* PruneEH pass incorrectly reports that a change was madeArtur Pilipenko2015-12-111-12/+7
| | | | | | | | Reviewed By: reames Differential Revision: http://reviews.llvm.org/D14097 llvm-svn: 255343
* [Mem2Reg] Respect optnoneJames Molloy2015-12-112-0/+24
| | | | | | | | Mem2Reg shouldn't be optimizing a function that is marked optnone. There is a test checking this that fails when mem2reg is explicitly added to the standard pass pipeline. llvm-svn: 255336
* [InstCombine] Make MatchBSwap also match bit reversalsJames Molloy2015-12-113-103/+250
| | | | | | MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse. llvm-svn: 255334
* Revert previous test commit.Maxim Ostapenko2015-12-111-1/+0
| | | | llvm-svn: 255331
* This is a test commit to check my commit access works.Maxim Ostapenko2015-12-111-0/+1
| | | | llvm-svn: 255330
* [PGO] Read VP raw data without depending on the Value fieldXinliang David Li2015-12-112-16/+14
| | | | | | | | | | | | | | | | Before this patch, each function's on-disk VP data is 'pointed' to by the Value field of per-function ProfileData structue, and read relies on this field (relocated with ValueDataDelta field) to read the value data. However this means the Value field needs to be updated during runtime before dumping, which creates undesirable data races. With this patch, the reading of VP data no longer depends on Value field. There is no format change. ValueDataDelta header field becomes obsolute but will be kept for compatibility reason (will be removed next time the raw format change is needed). llvm-svn: 255329
* Fix build after r255319.Hans Wennborg2015-12-111-1/+1
| | | | llvm-svn: 255322
* Fix a spurious if.Eric Christopher2015-12-111-1/+1
| | | | llvm-svn: 255321
* [LazyValueInfo] Stop inserting overdefined values into ValueCache toAkira Hatanaka2015-12-111-18/+48
| | | | | | | | | | | | | | | | | | reduce memory usage. Previously, LazyValueInfoCache inserted overdefined lattice values into both ValueCache and OverDefinedCache. This wasn't necessary and was causing LazyValueInfo to use an excessive amount of memory in some cases. This patch changes LazyValueInfoCache to insert overdefined values only into OverDefinedCache. The memory usage decreases by 70 to 75% when one of the files in llvm is compiled. rdar://problem/11388615 Differential revision: http://reviews.llvm.org/D15391 llvm-svn: 255320
* [PPC]: Peephole optimize small accesss to aligned globals.Kyle Butt2015-12-112-9/+361
| | | | | | | | | | | | | | | | | | | | | | | Access to aligned globals gives us a chance to peephole optimize nonzero offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't overflow the available displacement. For example: addis 3, 2, b4v@toc@ha addi 4, 3, b4v@toc@l lbz 5, b4v@toc@l(3) ; This is the result of the current peephole lbz 6, 1(4) ; optimizer lbz 7, 2(4) lbz 8, 3(4) If b4v is 4-byte aligned, we can skip using register 4 because we know that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate: addis 3, 2, b4v@toc@ha lbz 4, b4v@toc@l(3) lbz 5, b4v@toc@l+1(3) lbz 6, b4v@toc@l+2(3) lbz 7, b4v@toc@l+3(3) Saving a register and an addition. Larger alignments allow larger structures/arrays to be optimized. llvm-svn: 255319
* Check in the script for building Win snapshotsHans Wennborg2015-12-111-0/+93
| | | | llvm-svn: 255318
* [ProfileData] clang-format TextInstrProfReader::hasFormat. NFC.Vedant Kumar2015-12-111-2/+3
| | | | llvm-svn: 255317
* [X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1.Cong Hou2015-12-113-5/+435
| | | | | | | | | | | | Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers are counted from the result of running llc on the new test case in this patch. Differential revision: http://reviews.llvm.org/D15132 llvm-svn: 255315
* Format fix (NFC)Xinliang David Li2015-12-101-2/+4
| | | | llvm-svn: 255313
* s/need/needsEric Christopher2015-12-101-2/+2
| | | | llvm-svn: 255306
* Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,Eric Christopher2015-12-102-0/+171
| | | | | | | | | | | | x)) combines for ppc_fp128, since signbit computation is more complicated. Discussion thread: http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html Patch by Tim Shen! llvm-svn: 255305
* Attempt to fix the ReST compilation to html of the C API docs.Eric Christopher2015-12-101-14/+14
| | | | llvm-svn: 255304
* More non-ascii quote characters.Eric Christopher2015-12-101-2/+2
| | | | llvm-svn: 255303
* Clarify some of the wording on adding a new subcomponent to theEric Christopher2015-12-101-2/+2
| | | | | | C API. llvm-svn: 255302
* Fix non-ascii quotes.Eric Christopher2015-12-101-4/+4
| | | | llvm-svn: 255301
* Add C API guidelines to the developer policy to match discussionsEric Christopher2015-12-101-0/+27
| | | | | | on the llvm mailing lists. llvm-svn: 255300
* PPC: Teach FMA mutate to respect register classes.Kyle Butt2015-12-102-2/+98
| | | | | | | | | This was causing bad code gen and assembly that won't assemble, as mixed altivec and vsx code would end up with a vsx high register assigned to an altivec instruction, which won't work. Constraining the classes allows the optimization to proceed. llvm-svn: 255299
* [CMake] Add LLVM_BUILD_INSTRUMENTED option to enable building with ↵Chris Bieneman2015-12-101-0/+8
| | | | | | | | -fprofile-instr-generate This is the first step in supporting PGO data generation via CMake. I've marked the option as advanced and experimental until it is fleshed out further. llvm-svn: 255298
* [LibFuzzer] Introducing FUZZER_FLAG_UNSIGNED and using it for seeding.Mike Aizatsky2015-12-105-9/+25
| | | | | | | | Differential Revision: http://reviews.llvm.org/D15339 done llvm-svn: 255296
* EarlyCSE: add testsJF Bastien2015-12-101-10/+68
| | | | | | | | | | | | Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15371 llvm-svn: 255295
* Add a forward declaration (NFC)Xinliang David Li2015-12-101-0/+1
| | | | llvm-svn: 255292
* Delete a duplicate branch in IfConversion.cpp. NFC.Cong Hou2015-12-101-9/+0
| | | | llvm-svn: 255291
* [DAGCombiner] Fix PR25763 - vector comparison constant folding + sign-extensionSimon Pilgrim2015-12-102-5/+24
| | | | | | PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code. llvm-svn: 255289
OpenPOWER on IntegriCloud