summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [CGP] Build the DominatorTree lazilyTeresa Johnson2019-03-251-34/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In r355512 CGP was changed to build the DominatorTree only once per function traversal, to avoid repeatedly building it each time it was accessed. This solved one compile time issue but introduced another. In the second case, we now were building the DT unnecessarily many times when we performed many function traversals (i.e. more than once per function when running CGP because of changes made each time). Change to saving the DT in the CodeGenPrepare object, and building it lazily when needed. It is reset whenever we need to rebuild it. The case that exposed the issue there are 617 functions, and we walk them (i.e. execute the "while (MadeChange)" loop in runOnFunction) a total of 12083 times (so previously we were building the DT 12083 times). With this patch we only build the DT 844 times (average of 1.37 times per function). We dropped the total time to compile this file from 538.11s without this patch to 339.63s with it. There is still an issue as CGP is taking much longer than all other passes even with this patch, and before a recent compiler release cut at r355392 the total time to this compile was only 97 sec with a huge reduction in CGP time. I suspect that one of the other recent changes to CGP led to iterating each function many more times on average, but I need to do some more investigation. Reviewers: spatel Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59696 llvm-svn: 356937
* Moved everything SMT-related to LLVM and updated the cmake scripts.Mikhail R. Gadelha2019-03-252-1/+842
| | | | | | Differential Revision: https://reviews.llvm.org/D54978 llvm-svn: 356929
* MISched: Don't schedule regions with 0 instructionsMatt Arsenault2019-03-251-2/+6
| | | | | | | | | | | | | | | | | I think this is correct, but may not necessarily be the correct fix for the assertion I'm really trying to solve. If a scheduling region was found that only has dbg_value instructions, the RegPressure tracker would end up in an inconsistent state because it would skip over any debug instructions and point to an instruction outside of the scheduling region. It may still be possible for this to happen if there are some real schedulable instructions between dbg_values, but I haven't managed to break this. The testcase is extremely sensitive and I'm not sure how to make it more resistent to future scheduler changes that would avoid stressing this situation. llvm-svn: 356926
* AMDGPU: Preserve LiveIntervals in WQMMatt Arsenault2019-03-251-0/+2
| | | | | | This seems to already be done, but wasn't marked. llvm-svn: 356922
* [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction ↵Simon Pilgrim2019-03-251-7/+2
| | | | | | | | | | | | | | canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356913
* MinidumpYAML.cpp: Fix some code standard violations missed during reviewPavel Labath2019-03-251-12/+12
| | | | | | functions should begin with lower case letters. NFC. llvm-svn: 356901
* [DebugInfo] IntelJitEventListener follow up for "add SectionedAddress ..."Brock Wyma2019-03-251-3/+13
| | | | | | | | | | | | Following r354972 the Intel JIT Listener would not report line table information because the section indices did not match. There was a similar issue with the PerfJitEventListener. This change performs the section index lookup when building the object address used to query the line table information. Differential Revision: https://reviews.llvm.org/D59490 llvm-svn: 356895
* [MIPS GlobalISel] Select copy for arguments from FPRBRegBankPetar Avramovic2019-03-251-5/+15
| | | | | | | | | Move selectCopy into MipsInstructionSelector class. Select copy for arguments from FPRBRegBank for MIPS32. Differential Revision: https://reviews.llvm.org/D59644 llvm-svn: 356886
* [MIPS GlobalISel] Add floating point register bankPetar Avramovic2019-03-252-0/+7
| | | | | | | | | Add floating point register bank for MIPS32. Implement getRegBankFromRegClass for float register classes. Differential Revision: https://reviews.llvm.org/D59643 llvm-svn: 356883
* [MIPS GlobalISel] Lower float and double arguments in registersPetar Avramovic2019-03-252-36/+98
| | | | | | | | | | Lower float and double arguments in registers for MIPS32. When float/double argument is passed through gpr registers select appropriate move instruction. Differential Revision: https://reviews.llvm.org/D59642 llvm-svn: 356882
* Fix the build with GCC 4.8 after r356783Hans Wennborg2019-03-251-1/+1
| | | | llvm-svn: 356875
* [ARM GlobalISel] 64-bit memops should be alignedDiana Picus2019-03-251-9/+10
| | | | | | | | | | We currently use only VLDR/VSTR for all 64-bit loads/stores, so the memory operands must be word-aligned. Mark aligned operations as legal and narrow non-aligned ones to 32 bits. While we're here, also mark non-power-of-2 loads/stores as unsupported. llvm-svn: 356872
* [X86] Update some of the getMachineNode calls from X86ISelDAGToDAG to also ↵Craig Topper2019-03-251-8/+9
| | | | | | | | | include a VT for a EFLAGS result. This makes the nodes consistent with how they would be emitted from the isel table. llvm-svn: 356870
* [X86] When selecting (x << C1) op C2 as (x op (C2>>C1)) << C1, use the ↵Craig Topper2019-03-251-1/+2
| | | | | | | | | | | | | operation VT for the target constant. Normally when the nodes we use here(AND32ri8 for example) are selected their immediates are just converted from ConstantSDNode to TargetConstantSDNode without changing VT from the original operation VT. So we should still be emitting them with the operation VT. Theoretically this could expose more accurate opportunities for CSE. llvm-svn: 356869
* [X86] Remove GetLo8XForm and use GetLo32XForm instead. NFCICraig Topper2019-03-251-6/+1
| | | | | | | | We were using this to create an AND32ri8 node from a 64-bit and, but that node normally still uses a 32-bit immediate. So we should just truncate the existing immediate to i32. We already verified it has the same value in bits 31:7. llvm-svn: 356868
* [X86] Remove a couple unused SDNodeXForms. NFCCraig Topper2019-03-251-11/+0
| | | | llvm-svn: 356867
* Revert r356688 "[X86] Don't avoid folding multiple use sign extended 8-bit ↵Craig Topper2019-03-253-5/+18
| | | | | | | | immediate into instructions under optsize." Looking back over how the one use optimization works, I don't think this is the right way to fix this. llvm-svn: 356866
* [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)Simon Pilgrim2019-03-241-28/+33
| | | | | | | | Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864
* [WebAssembly] Rename a variable in CFGSort (NFC)Heejin Ahn2019-03-241-4/+4
| | | | | | | | Class `RegionInfo` was `SortUnitInfo` before, so the variables were named `SUI`. Now the class name is `RegionInfo`, so this renames `SUI` to `RI`, matching the class name. llvm-svn: 356861
* [LegalizeDAG] Expand i16 bswap directly to a rotate by 8 instead of relying ↵Craig Topper2019-03-241-3/+2
| | | | | | | | | | | | | | | | on DAG combine. An i16 bswap can be implemented with an i16 rotate by 8. We previously emitted a shift and OR sequence that DAG combine should be able to turn back into rotate. But we might as well go there directly. If rotate isn't legal, LegalizeDAG should further legalize it to either the opposite rotate, or the shift and OR pattern. I don't know of any way to get the existing DAG combine reliance to fail. So I don't know any way to add new tests for this that wouldn't have worked previously. llvm-svn: 356860
* [X86][AVX] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)Simon Pilgrim2019-03-241-2/+14
| | | | | | | | Just enable this for AVX for now as SSE41 introduces extra register moves for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern (but otherwise helps reduce port5 usage on Intel targets). Only AVX support is required for PR40685 as the issue is due to 8i8->8i32 zext shuffle leftovers. llvm-svn: 356858
* [CGP] Make several static functions member functions (NFC)Teresa Johnson2019-03-241-19/+25
| | | | | | | This is extracted from D59696 as suggested in the review. It is preparation for making the DominatorTree a member variable. llvm-svn: 356857
* [x86] improve the default expansion of uaddsat/usubsatSanjay Patel2019-03-241-3/+31
| | | | | | | | | | | | | | | This is yet another step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 uaddsat X, Y --> (X >u (X + Y)) ? -1 : X + Y usubsat X, Y --> (X >u Y) ? X - Y : 0 We can't count on a sane vector ISA, so override the default (umin/umax) expansion of unsigned add/sub saturate in cases where we do not have umin/umax. Differential Revision: https://reviews.llvm.org/D59006 llvm-svn: 356855
* [SLPVectorizer] shouldReorderOperands - just check for reordering. NFCI.Simon Pilgrim2019-03-241-28/+24
| | | | | | Remove the I.getOperand() calls from inside shouldReorderOperands - reorderInputsAccordingToOpcode should handle the creation of the operand lists and shouldReorderOperands should just check to see whether the i'th element should be commuted. llvm-svn: 356854
* [ConstantRange] Add getFull() + getEmpty() named constructors; NFCNikita Popov2019-03-246-81/+81
| | | | | | | | | | | | | | | | This adds ConstantRange::getFull(BitWidth) and ConstantRange::getEmpty(BitWidth) named constructors as more readable alternatives to the current ConstantRange(BitWidth, /* full */ false) and similar. Additionally private getFull() and getEmpty() member functions are added which return a full/empty range with the same bit width -- these are commonly needed inside ConstantRange.cpp. The IsFullSet argument in the ConstantRange(BitWidth, IsFullSet) constructor is now mandatory for the few usages that still make use of it. Differential Revision: https://reviews.llvm.org/D59716 llvm-svn: 356852
* Fix unused variable warning on non-asserts builds. NFCI.Simon Pilgrim2019-03-231-4/+3
| | | | llvm-svn: 356841
* Remove unused function argument. NFCI.Simon Pilgrim2019-03-231-5/+7
| | | | llvm-svn: 356840
* [DWARF] Delete a stray break and a stray comment. NFCFangrui Song2019-03-231-2/+1
| | | | llvm-svn: 356838
* [x86] reduce code duplication; NFCSanjay Patel2019-03-231-3/+5
| | | | llvm-svn: 356836
* [SLPVectorizer] reorderInputsAccordingToOpcode - use InstructionState ↵Simon Pilgrim2019-03-231-3/+6
| | | | | | directly. NFCI. llvm-svn: 356832
* [LowerSwitch] Use ConstantRange::fromKnownBits(); NFCNikita Popov2019-03-231-9/+3
| | | | | | | Using an unsigned range to stay NFC, but a signed range would really be more useful here. llvm-svn: 356831
* [SLPVectorizer] Don't repeat VL.size() call. NFCI.Simon Pilgrim2019-03-231-1/+1
| | | | llvm-svn: 356830
* [DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"Alexey Lapshin2019-03-232-0/+29
| | | | | | | | | | | | | | | | | | | | | | | [Symbolizer] Add getModuleSectionIndexForAddress() helper routine The https://reviews.llvm.org/D58194 patch changed symbolizer interface. Particularily it requires not only Address but SectionIndex also. Note object::SectionedAddress parameter: Expected<DILineInfo> symbolizeCode(const std::string &ModuleName, object::SectionedAddress ModuleOffset, StringRef DWPName = ""); There are callers of symbolizer which do not know particular section index. That patch creates getModuleSectionIndexForAddress() routine which will detect section index for the specified address. Thus if caller set ModuleOffset.SectionIndex into object::SectionedAddress::UndefSection state then symbolizer would detect section index using getModuleSectionIndexForAddress routine. Differential Revision: https://reviews.llvm.org/D58848 llvm-svn: 356829
* [Legacy][TimePasses] allow -time-passes reporting into a custom streamFedor Sergeev2019-03-222-12/+19
| | | | | | | | | | | | | | | | As a followup to newpm -time-passes fix (D59366), now adding a similar functionality to legacy time-passes. Enhancing llvm::reportAndResetTimings to accept an optional stream for reporting output. By default it still reports into the stream created by CreateInfoOutputFile (-info-output-file). Also fixing to actually reset after printing as declared. Reviewed By: philip.pfaffe Differential Revision: https://reviews.llvm.org/D59416 llvm-svn: 356824
* Followup for r356820 to fix the bots.Juergen Ributzka2019-03-221-1/+1
| | | | | | Try using a move constructor instead. llvm-svn: 356823
* [TextAPI] TBD Reader/WriterJuergen Ributzka2019-03-2210-0/+1351
| | | | | | | | | | | | | | | | | | | | | | | | Add basic infrastructure for reading and writting TBD files (version 1 - 3). The TextAPI library is not used by anything yet (besides the unit tests). Tool support will be added in a separate commit. The TBD format is currently documented in the implementation file (TextStub.cpp). https://reviews.llvm.org/D53945 Update: This contains changes to fix issues discovered by the bots: - add parentheses to silence warnings. - rename variables - use PlatformType from BinaryFormat - Trying if switching from a vector to an array will appeas the bots. - Replace the tuple with a struct to work around an explicit constructor bug. - This fixes an issue where we were leaking the YAML document if there was a parsing error. Updated the license information in all files. llvm-svn: 356820
* [SLP] Remove redundancy of performing operand reordering twice: once in ↵Simon Pilgrim2019-03-221-82/+171
| | | | | | | | | | | | | | | buildTree() and later in vectorizeTree(). This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree(). To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order. This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo Patch by: @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D59059 llvm-svn: 356814
* [TargetLowering] SimplifyDemandedBits trunc(srl(x, C1)) - early out for out ↵Simon Pilgrim2019-03-221-19/+19
| | | | | | of range C1. NFCI. llvm-svn: 356810
* [ARM] Don't form "ands" when it isn't scheduled correctly.Eli Friedman2019-03-221-1/+9
| | | | | | | | | | | | In r322972/r323136, the iteration here was changed to catch cases at the beginning of a basic block... but we accidentally deleted an important safety check. Restore that check to the way it was. Fixes https://bugs.llvm.org/show_bug.cgi?id=41116 Differential Revision: https://reviews.llvm.org/D59680 llvm-svn: 356809
* [X86] Use xmm registers to implement 64-bit popcnt on 32-bit targets if ↵Craig Topper2019-03-221-0/+22
| | | | | | | | | | | | possible if popcnt instruction is not available On 32-bit targets without popcnt, we currently expand 64-bit popcnt to sequences of arithmetic and logic ops for each 32-bit half and then add the 32 bit halves together. If we have xmm registers we can use use those to implement the operation instead. This results in less instructions then doing two separate 32-bit popcnt sequences. This mitigates some of PR41151 for the i64 on i686 case when we have SSE2. Differential Revision: https://reviews.llvm.org/D59662 llvm-svn: 356808
* [X86] Use movq for i64 atomic load on 32-bit targets when sse2 is enableCraig Topper2019-03-221-5/+44
| | | | | | | | | | | | | | We used a lock cmpxchg8b to do i64 atomic loads. But if we have SSE2 we can do better and use a plain movq to do the load instead. I tried to just use an f64 atomic load and add isel patterns to MOVSD(which the domain fixing pass can turn to MOVQ), but the atomic_load SDNode in TargetSelectionDAG.td requires the type to be integer. So I've emitted VZEXT_LOAD instead which should be selected by isel to a MOVQ. Hopefully we don't need a specific atomic flavor of this. I kept the memory operand from the original AtomicSDNode. I wasn't sure if I might need to set the MOVolatile flag? I've left some FIXMEs for improvements we can do without SSE2. Differential Revision: https://reviews.llvm.org/D59679 llvm-svn: 356807
* Fix non-determinism in Reassociate caused by address coincidencesDaniel Sanders2019-03-221-5/+18
| | | | | | | | | | | | | | | | | | | | | | | Summary: Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism. Thanks to Roman Tereshin for the test case. Reviewers: rtereshin, bogner Reviewed By: rtereshin Subscribers: mgrang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59401 llvm-svn: 356803
* [AArch64, ARM] Add support for Exynos M5Evandro Menezes2019-03-222-0/+4
| | | | | | Add Exynos M5 support and test cases. llvm-svn: 356793
* [ARM] [NFC] Use tGPR in patterns where appropriate.Eli Friedman2019-03-221-11/+12
| | | | | | | | | | This doesn't have any practical effect at the moment, as far as I know, because high registers aren't allocatable in Thumb1 mode. But it might matter in the future. Differential Revision: https://reviews.llvm.org/D59675 llvm-svn: 356791
* IR: Support parsing numeric block ids, and emit them in textual output.James Y Knight2019-03-225-14/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just as as llvm IR supports explicitly specifying numeric value ids for instructions, and emits them by default in textual output, now do the same for blocks. This is a slightly incompatible change in the textual IR format. Previously, llvm would parse numeric labels as string names. E.g. define void @f() { br label %"55" 55: ret void } defined a label *named* "55", even without needing to be quoted, while the reference required quoting. Now, if you intend a block label which looks like a value number to be a name, you must quote it in the definition too (e.g. `"55":`). Previously, llvm would print nameless blocks only as a comment, and would omit it if there was no predecessor. This could cause confusion for readers of the IR, just as unnamed instructions did prior to the addition of "%5 = " syntax, back in 2008 (PR2480). Now, it will always print a label for an unnamed block, with the exception of the entry block. (IMO it may be better to print it for the entry-block as well. However, that requires updating many more tests.) Thus, the following is supported, and is the canonical printing: define i32 @f(i32, i32) { %3 = add i32 %0, %1 br label %4 4: ret i32 %3 } New test cases covering this behavior are added, and other tests updated as required. Differential Revision: https://reviews.llvm.org/D58548 llvm-svn: 356789
* [ValueTracking] Avoid redundant known bits calculation in ↵Nikita Popov2019-03-221-6/+8
| | | | | | | | | | | | | | | | | | | computeOverflowForSignedAdd() We're already computing the known bits of the operands here. If the known bits of the operands can determine the sign bit of the result, we'll already catch this in signedAddMayOverflow(). The only other way (and as the comment already indicates) we'll get new information from computing known bits on the whole add, is if there's an assumption on it. As such, we change the code to only compute known bits from assumptions, instead of computing full known bits on the add (which would unnecessarily recompute the known bits of the operands as well). Differential Revision: https://reviews.llvm.org/D59473 llvm-svn: 356785
* [X86] lowerShuffleAsBitMask - ensure float bit masks are the correct width ↵Simon Pilgrim2019-03-221-5/+5
| | | | | | (PR41203) llvm-svn: 356784
* [AliasAnalysis] Second prototype to cache BasicAA / anyAA state.Alina Sbirlea2019-03-2211-218/+341
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it. AA changes: - This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode". - All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged. - AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added. MemorySSA changes: - All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults). - At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults. - All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built. - The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA. - All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis. Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59315 llvm-svn: 356783
* [ConstantFolding] Fix GetConstantFoldFPValue to avoid cast overflow.Bixia Zheng2019-03-221-4/+2
| | | | | | | | | | | | | | | | | | Summary: In C++, the behavior of casting a double value that is beyond the range of a single precision floating-point to a float value is undefined. This change replaces such a cast with APFloat::convert to convert the value, which is consistent with how we convert a double value to a half value. Reviewers: sanjoy Subscribers: lebedev.ri, sanjoy, jlebar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59500 llvm-svn: 356781
* InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image ↵Tim Renouf2019-03-221-2/+1
| | | | | | | | | | | | | intrinsics This helps to avoid the situation where RA spots that only 3 of the v4f32 result of a load are used, and immediately reallocates the 4th register for something else, requiring a stall waiting for the load. Differential Revision: https://reviews.llvm.org/D58906 Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa llvm-svn: 356768
OpenPOWER on IntegriCloud