summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Align Modi and FileInfo substreams on 32-byte offsets.Rui Ueyama2016-11-161-4/+4
| | | | | | | | | | This is required by DbiStream, but DbiStreamBuilder didn't align these substreams, so the output of DbiSTreamBuilder couldn't be read by DbiStream. Test will be added to LLD. llvm-svn: 287067
* Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.Vyacheslav Klochkov2016-11-161-0/+1
| | | | | | | Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26575 llvm-svn: 287064
* [BypassSlowDivision] Handle division by constant numerators better.Justin Lebar2016-11-161-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062
* [BypassSlowDivision] Simplify partially-tautological if statement.Justin Lebar2016-11-161-4/+3
| | | | | | if (A || (B && A)) --> if (A). llvm-svn: 287061
* Fix Modi and File count if there are more than 65535 modules/files.Rui Ueyama2016-11-161-2/+2
| | | | | | | | These numbers are intended to be capped at 65535, but `std::max<uint16_t>(UINT16_MAX, N)` always returns N for any N because the expression is the same as `std::max((uint16_t)UINT16_MAX, (uint16_t)N)`. llvm-svn: 287060
* Always use relative jump table encodings on PowerPC64.Joerg Sonnenberger2016-11-162-0/+59
| | | | | | | | | | | | | | | | | For the default, small and medium code model, use the existing difference from the jump table towards the label. For all other code models, setup the picbase and use the difference between the picbase and the block address. Overall, this results in smaller data tables at the expensive of one or two more arithmetic operation at the jump site. Given that we only create jump tables with a lot more than two entries, it is a net win in size. For larger code models the assumption remains that individual functions are no larger than 2GB. Differential Revision: https://reviews.llvm.org/D26336 llvm-svn: 287059
* AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argumentJan Vesely2016-11-151-0/+4
| | | | | | | | | | wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 llvm-svn: 287056
* [AddressSanitizer] Add support for (constant-)masked loads and stores.Filipe Cabecinhas2016-11-151-14/+85
| | | | | | | | | | | | | | | | This patch adds support for instrumenting masked loads and stores under ASan, if they have a constant mask. isInterestingMemoryAccess now supports returning a mask to be applied to the loads, and instrumentMop will use it to generate additional checks. Added tests for v4i32 v8i32, and v4p0i32 (~v4i64) for both loads and stores (as well as a test to verify we don't add checks to non-constant masks). Differential Revision: https://reviews.llvm.org/D26230 llvm-svn: 287047
* Object: replace backslashes with slashes in embedded relative thin archive ↵Peter Collingbourne2016-11-151-0/+6
| | | | | | | | | | paths on Windows. This makes these thin archives portable between *nix and Windows. Differential Revision: https://reviews.llvm.org/D26696 llvm-svn: 287038
* [AArch64] Add support for Qualcomm's Falkor CPU.Chad Rosier2016-11-153-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D26673 llvm-svn: 287036
* AMDGPU/SI: Fix pattern for i16 = sign_extend i1Tom Stellard2016-11-151-1/+5
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035
* [sanitizer-coverage] make sure asan does not instrument coverage guards ↵Kostya Serebryany2016-11-151-1/+1
| | | | | | (reported in https://github.com/google/oss-fuzz/issues/84) llvm-svn: 287030
* Fix llvm-symbolizer to correctly sort a symbol array and calculate symbol sizesKuba Brecka2016-11-151-12/+6
| | | | | | | | Sometimes, llvm-symbolizer gives wrong results due to incorrect sizes of some symbols. The reason for that was an incorrectly sorted array in computeSymbolSizes. The comparison function used subtraction of unsigned types, which is incorrect. Let's change this to return explicit -1 or 1. Differential Revision: https://reviews.llvm.org/D26537 llvm-svn: 287028
* GlobalISel: remove unused variable to silence warning.Tim Northover2016-11-152-2/+1
| | | | llvm-svn: 287027
* AMDGPU: Enable store clusteringMatt Arsenault2016-11-153-1/+13
| | | | | | | Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021
* [AArch64] Lower multiplication by a constant int to shl+add+shlHaicheng Wu2016-11-151-9/+39
| | | | | | | | | | | Lower a = b * C where C = (2^n + 1) * 2^m to add w0, w0, w0, lsl n lsl w0, w0, m Differential Revision: https://reviews.llvm.org/D229245 llvm-svn: 287019
* AMDGPU: Analyze mubuf with immediate soffsetMatt Arsenault2016-11-151-1/+6
| | | | | | | Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018
* AMDGPU: Fix return after elseMatt Arsenault2016-11-151-8/+14
| | | | llvm-svn: 287015
* Revert r286999 which caused buildbot test failures. Some testcases need to ↵Wei Mi2016-11-151-5/+6
| | | | | | be made target specific. llvm-svn: 287014
* AMDGPU: Replace assert(false) with unreachableMatt Arsenault2016-11-153-11/+17
| | | | llvm-svn: 287013
* [AMDGPU] Add wave barrier builtinStanislav Mekhanoshin2016-11-153-0/+20
| | | | | | | | | | | The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007
* [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.Wei Mi2016-11-151-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 286999
* Integer legalization: fix MUL expansionPawel Bylica2016-11-151-4/+4
| | | | | | | | | | | | | | | Summary: This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720. For tests I created a fuzz tester that compares the results with Boost.Multiprecision. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26628 llvm-svn: 286998
* vector load store with length (left justified) llvm portionZaara Syeda2016-11-151-4/+16
| | | | llvm-svn: 286993
* fix formatting; NFCSanjay Patel2016-11-151-1/+1
| | | | llvm-svn: 286989
* [IndVars] Change the order to compute WidenAddRec in widenIVUse.Wei Mi2016-11-151-2/+2
| | | | | | | | | | | | | | | | When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence return non-null but different WideAddRec, if getWideRecurrence is called before getExtendedOperandRecurrence, we won't bother to call getExtendedOperandRecurrence again. But As we know it is possible that after SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null WideAddRec, we know for sure that it is legal to do widening for current instruction. So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which will increase the chance of successful widening. Differential Revision: https://reviews.llvm.org/D26059 llvm-svn: 286987
* [ARM] GlobalISel: Remove unused members. NFCIDiana Picus2016-11-153-8/+4
| | | | | | This silences some warnings that I didn't see with my host compiler. llvm-svn: 286981
* [AVX-512] Add AVX-512 vector shift intrinsics to memory santitizer.Craig Topper2016-11-151-0/+31
| | | | | | Just needed to add the intrinsics to the exist switch. The code is generic enough to support the wider vectors with no changes. llvm-svn: 286980
* [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)Simon Pilgrim2016-11-151-1/+4
| | | | | | | | | | | | This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979
* Revert "[JumpThreading] Unfold selects that depend on the same condition"Pablo Barrio2016-11-151-77/+38
| | | | | | This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc. llvm-svn: 286976
* Revert "[JumpThreading] Prevent non-deterministic use lists"Pablo Barrio2016-11-151-7/+8
| | | | | | This reverts commit f2c2f5354070469dac253373c66527ca971ddc66. llvm-svn: 286975
* [ARM] Make sure GlobalISel is only initialized once. NFCIDiana Picus2016-11-151-12/+12
| | | | | | | | | Move some code inside the proper 'if' block to make sure it is only run once, when the subtarget is first created. Things can still break if we use different ARM target machines or if we have functions with different 'target-cpu' or 'target-features', we should fix that too in the future. llvm-svn: 286974
* [LoopVectorizer] When estimating reg usage, unused insts may "end" another useRobert Lougher2016-11-151-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 llvm-svn: 286969
* [PowerPC] Implement BE VSX load/store builtins - llvm portion.Tony Jiang2016-11-152-0/+15
| | | | | | | | | | | | This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE, they behaves exactly the same with vec_xl and vec_xst, therefore they are simply implemented by defining a matching macro. On LE, they are implemented by defining new builtins and intrinsics. For int/float/long long/double, it is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short, we also need some extra shuffling before or after call the builtins to get the desired BE order. For int128, simply call vec_xl or vec_xst. llvm-svn: 286967
* Get GlobalISel to build on Linux after r286407Diana Picus2016-11-151-1/+1
| | | | | | | r286407 has introduced calls to llvm::AddLandingPadInfo, which lives in the SelectionDAG component. Add it to LLVMBuild to avoid linker failures on Linux. llvm-svn: 286962
* [X86][FastISel] Assert that we are dealing with arithmetic with overflow ↵Zvi Rackover2016-11-151-0/+3
| | | | | | intrinsics. NFC llvm-svn: 286961
* [AMDGPU] TableGen: change individual instruction flags to bit type from bits<1>Sam Kolton2016-11-153-47/+47
| | | | | | | | | | | | Summary: This is needed to be able to use this flags in InstrMappings. Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26666 llvm-svn: 286960
* [X86][FastISel] Fix lowering of overflow result on AVX512 targetsZvi Rackover2016-11-151-2/+2
| | | | | | | | | | | | | | | | Summary: Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class. We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction. Fixes pr30981. Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D26620 llvm-svn: 286958
* Test commit, remove trailing space.Florian Hahn2016-11-151-1/+1
| | | | | | This commit is used to test commit access. llvm-svn: 286957
* Introduce TLI predicative for base-relative Jump Tables.Joerg Sonnenberger2016-11-153-38/+6
| | | | | | | | | | | For 64bit ABIs it is common practice to use relative Jump Tables with potentially different relocation bases. As the logic for the jump table itself doesn't depend on the relocation base, make it easier for targets to use the generic logic. Start by dropping the now redundant MIPS logic. Differential Revision: https://reviews.llvm.org/D26578 llvm-svn: 286951
* [ARM] Add machine scheduler for Cortex-R52 Javed Absar2016-11-153-1/+985
| | | | | | | | | | | | | This patch adds the Sched Machine Model for Cortex-R52. Details of the pipeline and descriptions are in comments in file ARMScheduleR52.td included in this patch. Reviewers: rengolin, jmolloy Differential Revision: https://reviews.llvm.org/D26500 llvm-svn: 286949
* DAGCombiner: fix combine of trunc and selectAsaf Badouh2016-11-151-1/+1
| | | | | | | | | | | | | bugzilla: https://llvm.org/bugs/show_bug.cgi?id=29002 pr29002 Differential Revision: https://reviews.llvm.org/D26449 llvm-svn: 286938
* TableGen: Add operator !orMatt Arsenault2016-11-154-2/+9
| | | | llvm-svn: 286936
* [X86][GlobalISel] Add minimal call lowering support to the IRTranslatorZvi Rackover2016-11-157-2/+196
| | | | | | | | | | | | | | | Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935
* [X86] Add LLVM version number for each intrinsic handled by auto upgrade for ↵Craig Topper2016-11-151-152/+158
| | | | | | | | | | age tracking. One day we'd like to remove some of this autoupgrade support and it will be easier if we know how long some of it has been around. Differential Revision: https://reviews.llvm.org/D26321 llvm-svn: 286933
* AMDGPU: Fix f16 fabs/fnegMatt Arsenault2016-11-152-4/+18
| | | | llvm-svn: 286931
* Simplify identify_magic.Rui Ueyama2016-11-151-26/+23
| | | | | | This patch defines a memcmp-ish helper function to simplify identify_magic. llvm-svn: 286928
* Improve DWARF parsing speed by improving DWARFAbbreviationDeclarationGreg Clayton2016-11-153-36/+132
| | | | | | | | | | | | | | | | | | | | This patch gets a DWARF parsing speed improvement by having DWARFAbbreviationDeclaration instances know if they have a fixed byte size. If an abbreviation has a fixed byte size that can be calculated given a DWARFUnit, then parsing a DIE becomes two steps: parse ULEB128 abbrev code, and then add constant size to the offset. This patch also adds a fixed byte size to each DWARFAbbreviationDeclaration::AttributeSpec so that attributes can quickly skip their values if needed without the need to lookup the fixed for size. Notable improvements: - DWARFAbbreviationDeclaration::findAttributeIndex() now returns an Optional<uint32_t> instead of a uint32_t and we no longer have to look for the magic -1U return value - Optional<uint32_t> DWARFAbbreviationDeclaration::findAttributeIndex(dwarf::Attribute attr) const; - DWARFAbbreviationDeclaration now has a getAttributeValue() function that extracts an attribute value given a DIE offset that takes advantage of the DWARFAbbreviationDeclaration::AttributeSpec::ByteSize - bool DWARFAbbreviationDeclaration::getAttributeValue(const uint32_t DIEOffset, const dwarf::Attribute Attr, const DWARFUnit &U, DWARFFormValue &FormValue) const; - A DWARFAbbreviationDeclaration instance can return a fixed byte size for itself so DWARF parsing is faster: - Optional<size_t> DWARFAbbreviationDeclaration::getFixedAttributesByteSize(const DWARFUnit &U) const; - Any functions that used to take a "const DWARFUnit *U" that would crash if U was NULL now take a "const DWARFUnit &U" and are only called with a valid DWARFUnit Differential Revision: https://reviews.llvm.org/D26567 llvm-svn: 286924
* Fix -Wswitch.Rui Ueyama2016-11-152-0/+2
| | | | llvm-svn: 286920
* Add a file magic for CL.exe's object file created with /GL.Rui Ueyama2016-11-152-11/+8
| | | | | | | | | | | | This patch makes it possible to identify object files created by CL.exe with /GL option. Such file contains Microsoft proprietary intermediate code instead of target machine code to do LTO. I need this to print out user-friendly error message from LLD. Differential Revision: https://reviews.llvm.org/D26645 llvm-svn: 286919
OpenPOWER on IntegriCloud