summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* DebugInfo: Include .dwo file name when hashing multiple CUs in a single fileDavid Blaikie2017-05-293-3/+13
| | | | | | | | | | | | | | | | | | | | This is really a workaround for ThinLTO in particular - since it can import partial CUs that may end up looking very similar/the same as the same partial import in another ThinLTO compile. An alternative fix would be to change the DICompileUnit metadata to include a "primary file" or the like - and when importing for ThinLTO set the primary file to the name of the DICompileUnit that is being imported into. This involves changing the schema and would reduce the excessive uniqueness in the hash that this change creates - allowing diagnosing of more duplicate CUs than will be caught with this change. But duplicate CUs can still be caught in non-ThinLTO builds & are mostly a nuisance rather than a particularly deliberate/effective tool for finding broken code. (arguably the hash could always include the dwo file and nothing in fission would break, I think..) llvm-svn: 304119
* Prune trailing whitespace. (To regenerate makefiles)NAKAMURA Takumi2017-05-281-2/+2
| | | | llvm-svn: 304112
* DebugInfo: Omit an empty CU when a subprogram was moved into its useDavid Blaikie2017-05-281-8/+12
| | | | | | | | When the only use of a CU is for a subprogram that's only emitted into the using CU (to avoid cross-CU references in DWO files), avoid creating that CU at all. llvm-svn: 304111
* [DAGCombiner] use narrow load to avoid vector extractSanjay Patel2017-05-271-0/+50
| | | | | | | | | | | | | | | | | | If we have (extract_subvector(load wide vector)) with no other users, that can just be (load narrow vector). This is intentionally conservative. Follow-ups may loosen the one-use constraint to account for the extract cost or just remove the one-use check. The memop chain updating is based on code that already exists multiple times in x86 lowering, so that should be pulled into a helper function as a follow-up. Background: this is a potential improvement noticed via regressions caused by making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see comments in D33137). Differential Revision: https://reviews.llvm.org/D33578 llvm-svn: 304072
* AArch64/PEI: Do not add reserved regs to liveinsMatthias Braun2017-05-271-1/+2
| | | | | | | We do not track liveness for reserved registers. It is unnecessary to add them to block livein lists. llvm-svn: 304059
* ScheduleDAGInstrs: Fix fixupKills()Matthias Braun2017-05-273-159/+51
| | | | | | | | | | | | Rewrite fixupKills() to use the LivePhysRegs class. Simplifies the code and fixes a bug where the CSR registers in return blocks where missed leading to invalid kill flags. Also remove the unnecessary rule that we wouldn't set kill flags on tied operands. No tests as I have an upcoming commit improving MachineVerifier checks to catch these cases in multiple existing lit tests. llvm-svn: 304055
* [GlobalISel] Add a localizer pass for target to useQuentin Colombet2017-05-273-0/+127
| | | | | | | | | | | | | This reverts commit r299287 plus clean-ups. The localizer pass is a helper pass that could be run at O0 in the GISel pipeline to work around the deficiency of the fast register allocator. It basically shortens the live-ranges of the constants so that the allocator does not spill all over the place. Long term fix would be to make the greedy allocator fast. llvm-svn: 304051
* BranchRelaxation: computeLiveIns() after creating new blockMatthias Braun2017-05-271-0/+4
| | | | | | | | One case in BranchRelaxation did not compute liveins after creating a new block. This is catched by existing tests with an upcoming commit that will improve MachineVerifier checking of livein lists. llvm-svn: 304049
* LivePhysRegs: Add default for removeRegsInMask(Clobbers); NFCMatthias Braun2017-05-261-1/+1
| | | | llvm-svn: 304036
* MachineVerifier: Remove unused set; NFCMatthias Braun2017-05-261-5/+0
| | | | llvm-svn: 304035
* Make helper functions static. NFC.Benjamin Kramer2017-05-262-6/+8
| | | | llvm-svn: 304029
* DebugInfo: Do not emit empty CUsDavid Blaikie2017-05-262-15/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consistent with GCC and addresses a shortcoming with ThinLTO where many imported CUs may end up being empty (because the functions imported from them either ended up not being used (and were then discarded, since they're imported as available_externally) or optimized away entirely). Test cases previously testing empty CUs (either intentionally, or because they didn't need anything more complicated) had a trivial 'int' or similar basic type added to their retained types list. This is a first order approximation - a deeper implementation could do things like: 1) Be more lazy about construction of the CU - for example if two CUs containing a single identical retained type are linked together, with this change one of the two CUs will be produced but empty (since a duplicate type won't be produced). 2) Go further and invert all the CU links the same way the subprogram link is inverted - keep named CU lists of retained types, macros, etc, and have those link back to the CU. Then if they're emitted, the CU is emitted, but never otherwise - this would allow the metadata itself to be dropped earlier too, though it seems unlikely that's an important optimization as there shouldn't be many CUs relative to the number of other entities. llvm-svn: 304020
* DebugInfo: Don't include locations for debug-having code inlined into ↵David Blaikie2017-05-261-0/+4
| | | | | | | | | | | | | | | nodebug functions This produced 'strange' DWARF anyway - the CU would have no ranges (or at least not a range including the inlined code) nor any subprogram or inlined_subroutine - yet the line table would have entries for these instructions. (this actually becomes more relevant with changes coming after this, where a CU without any contents will be omitted entirely - so there would be no line table to put this on anyway) llvm-svn: 304004
* LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEIMatthias Braun2017-05-261-31/+48
| | | | | | | | | | | | | | | | | Re-commit r303938 and r303954 with a fix for addLiveIns(): the internal addPristines() function must be called on an empty set or it may accidentally reset saved registers. - addLiveOutsNoPristines() needs to add callee saved registers that are actually saved and restored somewhere to the set (they are not pristine). - Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines(). This fixes the problem from D32156. Differential Revision: https://reviews.llvm.org/D32464 llvm-svn: 304001
* [DAGCombiner] use narrow vector ops to eliminate concat/extract (PR32790)Sanjay Patel2017-05-261-0/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the best case: extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN ...we kill all of the extract/concat and just have narrow binops remaining. If only one of the binop operands is amenable, this transform is still worthwhile because we kill some of the extract/concat. Optional bitcasting makes the code more complicated, but there doesn't seem to be a way to avoid that. The TODO about extending to more than bitwise logic is there because we really will regress several x86 tests including madd, psad, and even a plain integer-multiply-by-2 or shift-left-by-1. I don't think there's anything fundamentally wrong with this patch that would cause those regressions; those folds are just missing or brittle. If we extend to more binops, I found that this patch will fire on at least one non-x86 regression test. There's an ARM NEON test in test/CodeGen/ARM/coalesce-subregs.ll with a pattern like: t5: v2f32 = vector_shuffle<0,3> t2, t4 t6: v1i64 = bitcast t5 t8: v1i64 = BUILD_VECTOR Constant:i64<0> t9: v2i64 = concat_vectors t6, t8 t10: v4f32 = bitcast t9 t12: v4f32 = fmul t11, t10 t13: v2i64 = bitcast t12 t16: v1i64 = extract_subvector t13, Constant:i32<0> There was no functional change in the codegen from this transform from what I could see though. For the x86 test changes: 1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case, but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops, there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op. SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat ops to match the pattern. 2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract. Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026 3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT. 4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction count by one in each case by eliminating two insert/extract while adding one narrower logic op. https://bugs.llvm.org/show_bug.cgi?id=32790 Differential Revision: https://reviews.llvm.org/D33137 llvm-svn: 303997
* [DAG] Move legal type checks in store merge to be checked onlyNirav Dave2017-05-261-2/+4
| | | | | | on non-legal cases. NFC. llvm-svn: 303994
* [ARM] Fix lowering of misaligned memcpy/memsetJohn Brawn2017-05-261-12/+12
| | | | | | | | | | | | | | | | | Currently getOptimalMemOpType returns i32 for large enough sizes without checking for alignment, leading to poor code generation when misaligned accesses aren't permitted as we generate a word store then later split it up into byte stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for memset we splat the memset value into a word then immediately split it up again. Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type to use, but also fix a bug there where it wasn't correctly checking if misaligned memory accesses are allowed. Differential Revision: https://reviews.llvm.org/D33442 llvm-svn: 303990
* LivePhysRegs: Skip reserved regs in computeLiveIns; NFCIMatthias Braun2017-05-264-6/+12
| | | | | | | | | | Re-commit r303937 + r303949 as they were not the cause for the build failures. We do not track liveness of reserved registers so adding them to the liveins list in computeLiveIns() was completely unnecessary. llvm-svn: 303970
* Revert "LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI"Matthias Braun2017-05-261-41/+27
| | | | | | | | | | Tentatively revert this to see if it fixes the buildbot stage2 breakages. This reverts commit r303938. This reverts commit r303954. llvm-svn: 303960
* Revert "LivePhysRegs: Skip reserved regs in computeLiveIns; NFCI"Matthias Braun2017-05-264-12/+6
| | | | | | | | | | Tentatively revert, suspecting that it caused breakage in stage2 buildbots. This reverts commit r303949. This reverts commit r303937. llvm-svn: 303955
* LivePhysRegs: Follow-up to r303937Matthias Braun2017-05-261-1/+1
| | | | | | | We may have situations in which a superregister is reserved and not added to liveins, so we have to add the subregisters. llvm-svn: 303949
* LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEIMatthias Braun2017-05-251-27/+41
| | | | | | | | | | | | | - addLiveOutsNoPristines() needs to add callee saved registers that are actually saved and restored somewhere to the set (they are not pristine). - Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines(). This fixes the problem from D32156. Differential Revision: https://reviews.llvm.org/D32464 llvm-svn: 303938
* LivePhysRegs: Skip reserved regs in computeLiveIns; NFCIMatthias Braun2017-05-254-5/+11
| | | | | | | We do not track liveness of reserved registers so adding them to the liveins list in computeLiveIns() was completely unnecessary. llvm-svn: 303937
* DebugInfo: Simplify scopes+subprogram handling since the subprogram<>cu link ↵David Blaikie2017-05-251-18/+8
| | | | | | | | | | | | inversion Previously this code was defensive to the situation in which the debug info scopes would lead to a different subprogram from the subprogram in the CU's subprogram list (this could've happened with linkonce functions, etc as per the comment being removed). Since the CU<>SP link reversal this is no longer possible. llvm-svn: 303933
* Add constrained intrinsics for some libm-equivalent operationsAndrew Kaylor2017-05-255-61/+191
| | | | | | Differential revision: https://reviews.llvm.org/D32319 llvm-svn: 303922
* CodeGen: Rename DEBUG_TYPE to match passnamesMatthias Braun2017-05-2551-128/+112
| | | | | | | | Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921
* [CodeView Type Merging] Don't keep re-allocating temp serializer.Zachary Turner2017-05-251-2/+2
| | | | | | | | | | | | | | | | Previously, every time we wanted to serialize a field list record, we would create a new copy of FieldListRecordBuilder, which would in turn create a temporary instance of TypeSerializer, which itself had a std::vector<> that was about 128K in size. So this 128K allocation was happening every time. We can re-use the same instance over and over, we just have to clear its internal hash table and seen records list between each run. This saves us from the constant re-allocations. This is worth an ~18.5% speed increase (3.75s -> 3.05s) in my tests. Differential Revision: https://reviews.llvm.org/D33506 llvm-svn: 303919
* Fix SelectionDAGBuilder::getDbgValue to not expect DW_OP_deref on FI varsAdrian Prantl2017-05-251-14/+5
| | | | | | | | | | | | This fixes an oversight in r300522, which changed alloca dbg.values to no longer emit a DW_OP_deref. The array.ll testcase was regenerated from source. Fixes PR33166: https://bugs.llvm.org/show_bug.cgi?id=33166 llvm-svn: 303897
* DebugInfo: Produce debug_{gnu_}pub{names,types} entries when explicitly ↵David Blaikie2017-05-254-20/+27
| | | | | | | | | | | | | | | | | requested, even in -gmlt or when empty Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to parse the rest of the DIEs when building gdb-index. This causes gold to trip over LLVM's output when there are DW_FORM_ref_addr present. Gold does use the presence of a debug_gnu_pub{names,types} entry for the CU to skip parsing the debug_info portion, so make sure that's included even when empty (technically, when empty there couldn't be any ref_addr anyway - it only came up when gmlt didn't produce any (even non-empty) pubnames - but given what that reveals about gold's implementation, this seems like a good thing to do for consistency). llvm-svn: 303894
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-05-241-17/+32
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 303820
* [DAG] Prevent crashes when merging constant stores with high-bit set. NFC.Nirav Dave2017-05-241-2/+2
| | | | llvm-svn: 303802
* MachineCSE: Respect interblock physreg livenessMikael Holmen2017-05-241-2/+2
| | | | | | | | | | | | | | | | | | | | Summary: This is a fix for PR32538. MachineCSE first looks at MO.isDead(), but if it is not marked dead, MachineCSE still wants to do its own check to see if it is trivially dead. This check for the trivial case assumed that physical registers cannot be live out of a block. Patch by Mattias Eriksson. Reviewers: qcolombet, jbhateja Reviewed By: qcolombet, jbhateja Subscribers: jbhateja, llvm-commits Differential Revision: https://reviews.llvm.org/D33408 llvm-svn: 303731
* Revert LLVM changes for "Sema: allow imaginary constants via GNU extension ↵Tim Northover2017-05-231-4/+1
| | | | | | | | if UDL overloads not present." The changes accidentally crept into a Clang commit I was making. llvm-svn: 303697
* Sema: allow imaginary constants via GNU extension if UDL overloads not present.Tim Northover2017-05-231-1/+4
| | | | | | | | | | | | | C++14 added user-defined literal support for complex numbers so that you can write something like "complex<double> val = 2i". However, there is an existing GNU extension supporting this syntax and interpreting the result as a _Complex type. This changes parsing so that such literals are interpreted in terms of C++14's operators if an overload is present but otherwise falls back to the original GNU extension. llvm-svn: 303694
* AsmPrinter: mark the beginning and the end of a function in verbose modeFrancis Visoiu Mistrih2017-05-231-2/+8
| | | | llvm-svn: 303690
* [DAG] Add AddressSpace parameter to canMergeStoresTo. NFC.Nirav Dave2017-05-231-7/+10
| | | | llvm-svn: 303673
* Fix DIEHash refactoring that dropped the DW_AT_name from the hashDavid Blaikie2017-05-231-0/+1
| | | | llvm-svn: 303669
* [DAG] Add canMergeStoresTo predicate checks. NFCI.Nirav Dave2017-05-231-4/+6
| | | | | | Propagate canMergeStoresTo checks to missing cases in StoreMerge. llvm-svn: 303668
* Refactor DWARF hashing to use a .def file to avoid repetitionDavid Blaikie2017-05-233-158/+65
| | | | llvm-svn: 303666
* [AArch64] Make instruction fusion more aggressive. Florian Hahn2017-05-231-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch makes instruction fusion more aggressive by * adding artificial edges between the successors of FirstSU and SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps. * updating PostGenericScheduler::tryCandidate to keep clusters together, similar to GenericScheduler::tryCandidate. This change increases the number of AES instruction pairs generated on Cortex-A57 and Cortex-A72. This doesn't change code at all in most benchmarks or general code, but we've seen improvement on kernels using AESE/AESMC and AESD/AESIMC. Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB Reviewed By: evandro Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33230 llvm-svn: 303618
* [KnownBits] Use !hasConflict() in asserts in place of Zero & One == 0 or ↵Craig Topper2017-05-231-17/+17
| | | | | | similar. NFC llvm-svn: 303614
* [CodeGen] Fix uninitialized variables exposed by r303084Vitaly Buka2017-05-221-2/+2
| | | | | | | All other calls of analyzeBranch reset PredTBB and PredFBB, so I assume it's expected behavior. llvm-svn: 303581
* Don't generate line&scope debug info for meta-instructions.Adrian Prantl2017-05-224-9/+10
| | | | | | | | | | | | | | | MachineInstructions that don't generate any code (such as IMPLICIT_DEFs) should not generate any debug info either. Fixes PR33107. https://bugs.llvm.org/show_bug.cgi?id=33107 This reapplies r303566 without any modifications. The stage2 build failures persisted even after reverting this patch, and looking back through history, it looks like these tests are flaky. llvm-svn: 303575
* Revert "Don't generate line&scope debug info for meta-instructions."Adrian Prantl2017-05-224-10/+9
| | | | | | This reverts commit r303566 while investigating a stage2 buildbot failure. llvm-svn: 303570
* Don't generate line&scope debug info for meta-instructions.Adrian Prantl2017-05-224-9/+10
| | | | | | | | | | | MachineInstructions that don't generate any code (such as IMPLICIT_DEFs) should not generate any debug info either. Fixes PR33107. https://bugs.llvm.org/show_bug.cgi?id=33107 llvm-svn: 303566
* [DAG] Rework store merge to loop on load candidates. NFCI.Nirav Dave2017-05-221-202/+225
| | | | | | | Continue to consider remaining candidate merges until all possible merges have been considered. llvm-svn: 303560
* SimplifyLibCalls: Optimize wcslenMatthias Braun2017-05-191-19/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the strlen optimization code to work for both strlen and wcslen. This especially helps with programs in the wild where people pass L"string"s to const std::wstring& function parameters and the wstring constructor gets inlined. This also fixes a lingerind API problem/bug in getConstantStringInfo() where zeroinitializers would always give you an empty string (without a length) back regardless of the actual length of the initializer which did not work well in the TrimAtNul==false causing the PR mentioned below. Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG memcpy lowering and may lead to some cases for out-of-bounds zeroinitializer accesses not getting optimized anymore. So some code with UB may produce out of bound memory reads now instead of just producing zeros. The refactoring "accidentally" fixes http://llvm.org/PR32124 Differential Revision: https://reviews.llvm.org/D32839 llvm-svn: 303461
* [safestack] Disable stack coloring by default.Evgeniy Stepanov2017-05-191-1/+2
| | | | | | Workaround for apparent miscompilation of PR32143. llvm-svn: 303456
* Resubmit "[CodeView] Provide a common interface for type collections."Zachary Turner2017-05-191-28/+13
| | | | | | | | | | | | This was originally reverted because it was a breaking a bunch of bots and the breakage was not surfacing on Windows. After much head-scratching this was ultimately traced back to a bug in the lit test runner related to its pipe handling. Now that the bug in lit is fixed, Windows correctly reports these test failures, and as such I have finally (hopefully) fixed all of them in this patch. llvm-svn: 303446
* [DAGCombine] (addcarry 0, 0, X) -> (ext/trunc X)Amaury Sechet2017-05-191-0/+11
| | | | | | | | | | | | | | | Summary: While this makes some case better and some case worse - so it's unclear if it is a worthy combine just by itself - this is a useful canonicalisation. As per discussion in D32756 . Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32916 llvm-svn: 303441
OpenPOWER on IntegriCloud