summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* ARM64: make sure FastISel emits SSA MachineInstrsTim Northover2014-05-081-3/+4
| | | | | | We need to use a temporary register for a 2-step operation like REM. llvm-svn: 208297
* [asan] Preserve flags in asm instrumentation.Evgeniy Stepanov2014-05-081-8/+32
| | | | | | Patch by Yuri Gorshenin. llvm-svn: 208296
* Use a vector of unique_ptrs to fix a memory leak introduced in r208179.Daniel Sanders2014-05-081-2/+2
| | | | | | | | | | | Also removed an inaccurate comment that stated that a DenseMap was used as storage for the ListInit*'s. It's currently using a FoldingSet. I expect there's a better way to fix this but I haven't found it yet. FoldingSet is incompatible with the Pool template and I'm not sure if FoldingSet can be safely replaced with a DenseMap of computed FoldingSetID's to ListInit*'s. llvm-svn: 208293
* Move late partial-unrolling thresholds into the processor definitionsHal Finkel2014-05-086-77/+75
| | | | | | | | | | | | | | | | | | | | | | The old method used by X86TTI to determine partial-unrolling thresholds was messy (because it worked by testing target features), and also would not correctly identify the target CPU if certain target features were disabled. After some discussions on IRC with Chandler et al., it was decided that the processor scheduling models were the right containers for this information (because it is often tied to special uop dispatch-buffer sizes). This does represent a small functionality change: - For generic x86-64 (which uses the SB model and, thus, will get some unrolling). - For AMD cores (because they still currently use the SB scheduling model) - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump the default threshold to 50; we're working on a test case for this). Otherwise, nothing has changed for any other targets. The logic, however, has been moved into BasicTTI, so other targets may now also opt-in to this functionality simply by setting LoopMicroOpBufferSize in their processor model definitions. llvm-svn: 208289
* Revert "SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time"Tobias Grosser2014-05-081-3/+6
| | | | | | as committed in r208282. The original commit was incorrect. llvm-svn: 208286
* AArch64/ARM64: Port NEON post-increment load/store with 2/3/4 vectors to ↵Hao Liu2014-05-083-56/+743
| | | | | | ARM64 backend. llvm-svn: 208284
* SCEV: Use I = vector<>.erase(I) to iterate and delete at the same timeTobias Grosser2014-05-081-6/+3
| | | | llvm-svn: 208282
* ARM: support FK_SecRel_2 relocations on WoASaleem Abdulrasool2014-05-082-0/+6
| | | | | | | | This adds FK_SecRel_2 relocation support to ARM. This enables the building of object files for armv7-windows-msvc which enables CodeView line tables for debugging as opposed to armv7-windows-itanium which currently uses DWARF. llvm-svn: 208273
* Simplify and fix incorrect comment. No functionality change.Richard Smith2014-05-081-22/+15
| | | | llvm-svn: 208272
* Lower certain build_vectors to insertps instructionsFilipe Cabecinhas2014-05-081-0/+76
| | | | | | | | | | | | | | | | | | | | | | Summary: Vectors built with zeros and elements in the same order as another (source) vector are optimized to be built using a single insertps instruction. Also optimize when we move one element in a vector to a different place in that vector while zeroing out some of the other elements. Further optimizations are possible, described in TODO comments. I will be implementing at least some of them in the near future. Added some tests for different cases where this optimization triggers. Reviewers: nadav, delena, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3521 llvm-svn: 208271
* Back out r208257 while I investigate tester failures.Lang Hames2014-05-071-14/+0
| | | | llvm-svn: 208267
* GlobalValue: Assert symbols with local linkage have default visibilityDuncan P. N. Exon Smith2014-05-071-3/+2
| | | | | | | | The change to ExtractGV.cpp has no functionality change except to avoid the asserts. Existing testcases already cover this, so I didn't add a new one. llvm-svn: 208264
* IR: Don't allow non-default visibility on local linkageDuncan P. N. Exon Smith2014-05-072-3/+27
| | | | | | | | | | | | | | | | | Visibilities of `hidden` and `protected` are meaningless for symbols with local linkage. - Change the assembler to reject non-default visibility on symbols with local linkage. - Change the bitcode reader to auto-upgrade `hidden` and `protected` to `default` when the linkage is local. - Update LangRef. <rdar://problem/16141113> llvm-svn: 208263
* LTO: Assert visibility of local linkage when merging symbolsDuncan P. N. Exon Smith2014-05-071-0/+2
| | | | | | | | `ModuleLinker::getLinkageResult()` shouldn't create symbols with local linkage and non-default visibility -- in fact, symbols with local linkage shouldn't be merged at all. Assert to that effect. llvm-svn: 208262
* LTO: Check local linkage firstDuncan P. N. Exon Smith2014-05-071-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Since visibility is meaningless for symbols with local linkage, check local linkage before visibility when setting symbol attributes. When linkage is `internal` and the visibility is `hidden`, the exposed attribute is now `LTO_SYMBOL_SCOPE_INTERNAL` instead of `LTO_SYMBOL_SCOPE_HIDDEN`. Although the bitfield allows *both* to be specified, the combination is nonsense anyway. Given changes (in progress) to drop visibility when a symbol has local linkage, this almost has no functionality change: it's mostly a cleanup to clarify the logic. The exception is when something has `appending` linkage. Before this change, such symbols would be advertised as `LTO_SYMBOL_SCOPE_INTERNAL`; now, they'll be given `LTO_SYMBOL_SCOPE_COMMON`. Unfortunately this is really awkward to test. This only changes what we advertise to linkers (before running LTO), not what the final object looks like. In theory I could add `DEBUG` output to `llvm-lto` (and test with "REQUIRES: asserts"), but follow-up commits to disallow `internal hidden` simplify this anyway. <rdar://problem/16141113> llvm-svn: 208261
* [RuntimeDyld] Make RuntimeDyldImpl::resolveExternalSymbols preserve theLang Hames2014-05-071-0/+14
| | | | | | | | | | | | | | | | | relocation entries it applies. Prior to this patch, RuntimeDyldImpl::resolveExternalSymbols discarded relocations for external symbols once they had been applied. This causes issues if the client calls MCJIT::finalizeLoadedModules more than once, and updates the location of any symbols in between (e.g. by calling MCJIT::mapSectionAddress). No test case yet: None of our in-tree memory managers support moving sections around. I'll have to hack up a dummy memory manager before I can write a unit test. Fixes <rdar://problem/16764378> llvm-svn: 208257
* [X86TTI] Remove the unrolling branch limitsHal Finkel2014-05-071-40/+13
| | | | | | | | | | | | | | | | | | | | The loop stream detector (LSD) on modern Intel cores, which optimizes the execution of small loops, has limits on the number of taken branches in addition to uop-count limits (modern AMD cores have similar limits). Unfortunately, at the IR level, estimating the number of branches that will be taken is difficult. For one thing, it strongly depends on later passes (block placement, etc.). The original implementation took a conservative approach and limited the maximal BB DFS depth of the loop. However, fairly-extensive benchmarking by several of us has revealed that this is the wrong approach. In fact, there are zero known cases where the branch limit prevents a detrimental unrolling (but plenty of cases where it does prevent beneficial unrolling). While we could improve the current branch counting logic by incorporating branch probabilities, this further complication seems unjustified without a motivating regression. Instead, unless and until a regression appears, the branch counting will be removed. llvm-svn: 208255
* [X86] Selectively mark the FMA variants inside a family as isCommutable.Quentin Colombet2014-05-071-14/+32
| | | | | | | | | | | | | | | | | | | Given a FMA family (e.g., 213, 231), not all the variants (i.e., register or memory) are commutable. E.g., for the 213 family (with the syntax src1, src2, src3): fmaXXX213 A, B, reg3/mem3 == fmaXXX213 B, A, reg3/mem3 Now consider the 231 family: fmaXXX231 A, B, reg3 == fmaXXX231 A, reg3, B But fmaXXX231 A, B, mem3 != fmaXXX231 A, mem3, B Indeed, mem3 cannot be the second argument of the memory variant of fmaXXX231. Working on a reduced test case! <rdar://problem/16800495> llvm-svn: 208252
* Reformat a couple of functions for clarity.Eric Christopher2014-05-071-22/+19
| | | | llvm-svn: 208248
* [Hexagon] Add New TSFlags to be used in the upcoming patches.Jyotsna Verma2014-05-074-67/+102
| | | | llvm-svn: 208239
* avoid segfaultingSebastian Pop2014-05-071-2/+1
| | | | | | *Quotient and *Remainder don't have to be initialized. llvm-svn: 208238
* do not collect undef termsSebastian Pop2014-05-071-1/+36
| | | | llvm-svn: 208237
* Fix using wrong result type for setcc.Matt Arsenault2014-05-072-4/+16
| | | | | | | | | | | When reducing the bitwidth of a comparison against a constant, the original setcc's result type was used, which was incorrect. No test since I don't think any other in tree targets change the bitwidth of the setcc type depending on the bitwidth of the compared type. llvm-svn: 208236
* split delinearization pass in 3 stepsSebastian Pop2014-05-073-397/+484
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To compute the dimensions of the array in a unique way, we split the delinearization analysis in three steps: - find parametric terms in all memory access functions - compute the array dimensions from the set of terms - compute the delinearized access functions for each dimension The first step is executed on all the memory access functions such that we gather all the patterns in which an array is accessed. The second step reduces all this information in a unique description of the sizes of the array. The third step is delinearizing each memory access function following the common description of the shape of the array computed in step 2. This rewrite of the delinearization pass also solves a problem we had with the previous implementation: because the previous algorithm was by induction on the structure of the SCEV, it would not correctly recognize the shape of the array when the memory access was not following the nesting of the loops: for example, see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll ; void foo(long n, long m, long o, double A[n][m][o]) { ; ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; for (long k = 0; k < o; k++) ; A[i][k][j] = 1.0; Starting with this patch we no longer delinearize access functions that do not contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll ;; for (long int i = 0; i < 100; i++) ;; for (long int j = 0; j < 100; j++) { ;; A[2*i - 4*j] = i; ;; *B++ = A[6*i + 8*j]; these accesses will not be delinearized as the upper bound of the loops are constants, and their access functions do not contain SCEVUnknown parameters. llvm-svn: 208232
* [x86] Make the 'x86-64' cpu, what I see as and many use as the genericChandler Carruth2014-05-071-2/+15
| | | | | | | | | | | | | | | | default architecture for reasonable modern x86 processors, actually be modern. This processor model should essentially be "tuned" for modern x86 chips as much as possible without undue penalties on any specific architecture. Previously we weren't even using the nice scheduling models. There are a few other tweaks needed here, but this change at least I have benchmarked across a decent swatch of chips (intel's clovertown, westmere, and sandybridge; amd's istanbul) and seen no significant regressions. If anyone has suggested ways to test this, just let me know. Somewhat alarmingly, no existing tests failed. llvm-svn: 208230
* Tidy up whitespace with clang-format prior to making significantChandler Carruth2014-05-071-45/+41
| | | | | | changes. llvm-svn: 208229
* [yaml2obj] Support ELF x86 relocations.Simon Atanasyan2014-05-071-0/+43
| | | | llvm-svn: 208228
* [ARM64][fast-isel] Disable target specific optimizations at -O0. Functionally,Chad Rosier2014-05-073-31/+30
| | | | | | | | | | | | | | | this patch disables the dead register elimination pass and the load/store pair optimization pass at -O0. The ILP optimizations don't require the optimization level to be checked because the call to addILPOpts is predicated with the necessary check. The AdvSIMDScalar pass is disabled by default at all optimization levels. This patch leaves that pass disabled by default. Also, move command-line options into ARM64TargetMachine.cpp and add a few additional flags to aid in debugging. This fixes an issue with the -debug-pass=Structure flag where passes were printed, but not actually run (i.e., AdvSIMDScalar pass). llvm-svn: 208223
* [mips] Add highly experimental support for MIPS-I, MIPS-II, MIPS-III, and MIPS-VDaniel Sanders2014-05-073-5/+38
| | | | | | | | | | | | | | | | | | | | Summary: These processors will only be available for the integrated assembler at first (CodeGen will emit a fatal error saying they are not implemented). The intention is to work through the existing instructions and correctly annotate the ISA they were added in so that we have a sufficiently good base to start MIPS64r6 development. MIPS64r6 removes/re-encodes certain instructions and I believe it is best to define ISA's using set-union's as far as possible rather than using set-subtraction. Reviewers: vmedic Subscribers: emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D3569 llvm-svn: 208221
* llvm-cov: Explicitly namespace llvm::make_unique to keep MSVC happyJustin Bogner2014-05-071-4/+2
| | | | | | | | This is a followup to r208171, where a call to make_unique was disambiguated for MSVC. Disambiguate two more calls, and remove the comment about it since this is what we do everywhere. llvm-svn: 208219
* Use range loop.Rafael Espindola2014-05-071-3/+2
| | | | llvm-svn: 208218
* [InstCombine] Add optimization of redundant insertvalue instructions.Michael Zolotukhin2014-05-072-0/+37
| | | | | | rdar://problem/11861387 llvm-svn: 208214
* [mips] Add FGR_32/FGR_64/GPR_64 adjectives and use then instead of ↵Daniel Sanders2014-05-073-161/+156
| | | | | | | | | | | | | | | | | FGRPredicates/GPRPredicates Summary: No functional change (confirmed by diffing tablegen-erated files). Depends on D3642 Reviewers: vmedic, dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3645 llvm-svn: 208213
* [mips] Add INSN_<name> adverbs and start using them instead of ↵Daniel Sanders2014-05-071-6/+8
| | | | | | | | | | | | | | | | | AdditionalPredicates overrides Summary: No functional change Depends on D3641 Reviewers: vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3642 llvm-svn: 208212
* [msan] Fix -fsanitize=memory -fno-integrated-as.Evgeniy Stepanov2014-05-071-1/+1
| | | | llvm-svn: 208211
* AArch64/ARM64: optimise vector selects & enable testTim Northover2014-05-071-0/+41
| | | | | | | | | When performing a scalar comparison that feeds into a vector select, it's actually better to do the comparison on the vector side: the scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP" since the vector comparisons are all mask based. llvm-svn: 208210
* [mips] Add ISA_<name> adverbs and start using them instead of ↵Daniel Sanders2014-05-074-42/+46
| | | | | | | | | | | | | | | | | | AdditionalPredicates overrides Summary: One small functional change. The recently added PAUSE instruction now has the HasStdEnc predicate which was accidentally removed by a Requires<>. Depends on D3640 Reviewers: vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3641 llvm-svn: 208209
* Remove the UseCFI option from createAsmStreamer.Rafael Espindola2014-05-078-29/+20
| | | | | | We were already always passing true, this just removes the option. llvm-svn: 208205
* [mips] Continue splitting Instruction.Predicates into smaller lists and ↵Daniel Sanders2014-05-073-29/+39
| | | | | | | | | | | | | | | | | | | re-join them with !listconcat Summary: Move IsGP64bit into GPRPredicates, and IsFP64bit/NotFP64bit into FGRPredicates No functional change (confirmed by diffing tablegen-erated files). Depends on D3639 Reviewers: vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3640 llvm-svn: 208201
* [ARM64-BE] Fix fast-isel, and add appropriate RUN lines to appropriate tests.James Molloy2014-05-071-0/+5
| | | | llvm-svn: 208200
* [ARM64-BE] Fix variable-argument saving.James Molloy2014-05-071-1/+2
| | | | llvm-svn: 208199
* [ARM64-BE] Implement the lane-twiddling logic at AAPCS boundaries for big ↵James Molloy2014-05-071-0/+17
| | | | | | | | | | | | | | | | endian. The AAPCS states that values passed in registers must have a value as though they had been loaded with "LDR". LDR is equivalent to "LD1.64 vX.1D" - that is, loading scalars to vector registers and loading 1-element vectors is equivalent. The logic implemented here is to ensure that at all call boundaries and during formal argument lowering all vectors are treated as their bitwidth-based floating point scalar counterpart, which is always one of f64 or f128 (v2i32 -> f64, v4i32 -> f128 etc). A BITCAST is inserted so that the appropriate REV will be generated during code generation. llvm-svn: 208198
* [mips] Move IsFP64bit/NotFP64bit to the front of the AdditionalPredicates listDaniel Sanders2014-05-071-6/+6
| | | | | | | | | | | | | | Summary: This makes it easier to prove a more complicated change in the next commit is non-functional. Reviewers: vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3639 llvm-svn: 208197
* [ARM64-BE] Implement the crazy bitcast handling for big endian vectors.James Molloy2014-05-071-46/+326
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we've canonicalised on using LD1/ST1, every time we do a bitcast between vector types we must do an equivalent lane reversal. Consider a simple memory load followed by a bitconvert then a store. v0 = load v2i32 v1 = BITCAST v2i32 v0 to v4i16 store v4i16 v2 In big endian mode every memory access has an implicit byte swap. LDR and STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that is, they treat the vector as a sequence of elements to be byte-swapped. The two pairs of instructions are fundamentally incompatible. We've decided to use LD1/ST1 only to simplify compiler implementation. LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes the original code sequence: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = BITCAST v2i32 v1 to v4i16 v3 = REV v4i16 v2 (implicit) store v4i16 v3 But this is now broken - the value stored is different to the value loaded due to lane reordering. To fix this, on every BITCAST we must perform two other REVs: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = REV v2i32 v3 = BITCAST v2i32 v2 to v4i16 v4 = REV v4i16 v5 = REV v4i16 v4 (implicit) store v4i16 v5 This means an extra two instructions, but actually in most cases the two REV instructions can be combined into one. For example: (REV64_2s (REV64_4h X)) === (REV32_4h X) There is also no 128-bit REV instruction. This must be synthesized with an EXT instruction. Most bitconverts require some sort of conversion. The only exceptions are: a) Identity conversions - vNfX <-> vNiX b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX Even though there are hundreds of changed lines, I have a fairly high confidence that they are somewhat correct. The changes to add two REV instructions per bitcast were pretty mechanical, and once I'd done that I threw the resulting .td at a script I wrote which combined the two REVs together (and added an EXT instruction, for f128) based on an instruction description I gave it. This was much less prone to error than doing it all manually, plus my brain would not just have melted but would have vapourised. llvm-svn: 208194
* [ARM64-BE] Predicate VLDR/VSTR for vectors as little-endian only. We must ↵James Molloy2014-05-071-95/+131
| | | | | | use LD1/ST1 on big-endian. llvm-svn: 208193
* [ARM64-BE] Make big endian (scalar) argument passing work correctly.James Molloy2014-05-071-6/+38
| | | | | | | | | | This completes the port of r204814 (cpirker "AArch64_BE function argument passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues found during later development along the way. The biggest of these was that the alignment fixup logic wasn't replicated into all the places it should have been. llvm-svn: 208192
* MergeFunctions Pass, introduced total ordering among values.Stepan Dyatkovskiy2014-05-071-41/+96
| | | | | | | | | | | | | | | | | | | This is a third patch of patch series that improves MergeFunctions performance time from O(N*N) to O(N*log(N)). This patch description: Being comparing functions we need to compare values we meet at left and right sides. Its easy to sort things out for external values. It just should be the same value at left and right. But for local values (those were introduced inside function body) we have to ensure they were introduced at exactly the same place, and plays the same role. In short, patch introduces values serial numbering and comparison routine. The last one compares two values by their serial numbers. llvm-svn: 208189
* [mips] Split Instruction.Predicates into smaller lists and re-join them with ↵Daniel Sanders2014-05-077-77/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | !listconcat Summary: The overall idea is to chop the Predicates list into subsets that are usually overridden independently. This allows subclasses to partially override the predicates of their superclasses without having to re-add all the existing predicates. This patch starts the process by moving HasStdEnc into a new EncodingPredicates list and almost everything else into AdditionalPredicates. It has revealed a couple likely bugs where 'let Predicates' has removed the HasStdEnc predicate. No functional change (confirmed by diffing tablegen-erated files). Depends on D3549, D3506 Reviewers: vmedic Differential Revision: http://reviews.llvm.org/D3550 llvm-svn: 208184
* [tablegen] Add !listconcat operator with the similar semantics as !strconcatDaniel Sanders2014-05-074-2/+36
| | | | | | | | | | | | | | | | | | | | Summary: It concatenates two or more lists. In addition to the !strconcat semantics the lists must have the same element type. My overall aim is to make it easy to append to Instruction.Predicates rather than override it. This can be done by concatenating lists passed as arguments, or by concatenating lists passed in additional fields. Reviewers: dsanders Reviewed By: dsanders Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D3506 llvm-svn: 208183
* [mips] Move HasStdEnc to the front of the predicates lists.Daniel Sanders2014-05-075-61/+61
| | | | | | | | | | | | | | | | | | Summary: This will make it easier to prove that a more complicated change in the following commit is non-functional. No functional change. Depends on D3506 Reviewers: vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3549 llvm-svn: 208179
OpenPOWER on IntegriCloud