| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
they're split on AVX1
This needs to be generalised further to support AVX512BW cases but I want to add non-uniform constants first.
llvm-svn: 324844
|
| |
|
|
|
|
|
|
|
|
|
| |
The related cases for (X * Y) / X were handled in rL124487.
https://rise4fun.com/Alive/6k9
The division in these tests is subsequently eliminated by existing instcombines
for 1/X.
llvm-svn: 324843
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare.
I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior.
Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into.
I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42935
llvm-svn: 324842
|
| |
|
|
|
|
| |
NFCI.
llvm-svn: 324841
|
| |
|
|
|
|
| |
The related cases for (X * Y) / X were handled in rL124487.
llvm-svn: 324840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bits
This reverses instcombine's demanded bits' transform which always tries to clear bits in constants.
As noted in PR35792 and shown in the test diffs:
https://bugs.llvm.org/show_bug.cgi?id=35792
...we can do better in codegen by trying to form -1. The x86 sub test shows a missed opportunity.
I did investigate changing instcombine's behavior, but it would be more work to change
canonicalization in IR. Clearing bits / shrinking constants can allow killing instructions,
so we'd have to figure out how to not regress those cases.
Differential Revision: https://reviews.llvm.org/D42986
llvm-svn: 324839
|
| |
|
|
| |
llvm-svn: 324838
|
| |
|
|
|
|
|
|
| |
This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT.
Differential Revision: https://reviews.llvm.org/D43014
llvm-svn: 324837
|
| |
|
|
|
|
|
| |
https://reviews.llvm.org/rL324835
Change-Id: I2526c2f342654e85ce054237de03ae9db9ab4994
llvm-svn: 324836
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748
llvm-svn: 324835
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bits and 512 bits aren't required
This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors.
For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar.
This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking.
I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences.
We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it.
Differential Revision: https://reviews.llvm.org/D42724
llvm-svn: 324834
|
| |
|
|
|
|
|
|
| |
We promote these via a DAG combine now before lowering gets the chance.
Also remove the v2i1 custom handling since it will no longer be triggered.
llvm-svn: 324833
|
| |
|
|
|
|
|
|
| |
SelectionDAG::getBoolConstant in the one place it was used.
SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently.
llvm-svn: 324832
|
| |
|
|
|
|
|
|
| |
blocks. NFC
These were added as part of the refactoring for prefer vector width. At the time I thought the hasAVX512 here would be replaced with "allow 512 bit vectors" so that it would read "allow 512 bit vectors OR VLX". But now the plan is to only give the option of disabling 512 bit vectors when VLX is enabled. So we don't need this qualification at all
llvm-svn: 324831
|
| |
|
|
| |
llvm-svn: 324830
|
| |
|
|
|
|
| |
As discussed on D43014, we need the ability to flip SMIN/SMAX to (legal) UMIN/UMAX
llvm-svn: 324829
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
return vXi1 mask. Make bitcasts to scalar explicit in IR
Summary: This is the clang equivalent of r324827
Reviewers: zvi, delena, RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43143
llvm-svn: 324828
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vXi1 mask type to be closer to an fcmp.
Summary:
This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR.
This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares.
Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it.
This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once.
Reviewers: spatel, delena, RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43137
llvm-svn: 324827
|
| |
|
|
|
|
| |
As discussed on D43014, we need the ability to flip UMIN/UMAX to (legal) SMIN/SMAX
llvm-svn: 324826
|
| |
|
|
|
|
| |
Includes adding m_NonNegative constant pattern matcher
llvm-svn: 324825
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This test would fail if the python path had spaces. Add a quote around the path to fix this problem and update some test values changed by the addition of quotes around the path.
Tested on Windows and Linux with Python 3.x
Reviewers: zturner, llvm-commits
Subscribers: klimek, cfe-commits
Differential Revision: https://reviews.llvm.org/D43164
llvm-svn: 324824
|
| |
|
|
|
|
| |
Until Skylake, most hardware could only issue a PMULLD op every other cycle
llvm-svn: 324823
|
| |
|
|
|
|
|
|
|
|
| |
(v4i1 (setcc (v4f32)))
Undef VLX, getSetCCResultType returns v2i1/v4i1 for v2f32/v4f32 so default type legalization will end up changing the setcc result type back to vXi1 if it had been extended. The resulting extend gets messed up further by type legalization and is difficult to recombine back to (v4i32 (setcc (v4f32))) after legalization.
I went ahead and enabled this for SSE2 and later since its always the result we want and this helps type legalization get there in less steps.
llvm-svn: 324822
|
| |
|
|
| |
llvm-svn: 324821
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
before type legalization.
This prevents extends of masks being introduced during lowering where it become difficult to combine them out.
There are a few oddities in here.
We sometimes concatenate two k-registers produced by two compares, sign_extend the combined pair, then extract two halves. This worked better previously because the sign_extend wasn't created until after the fp_to_sint was split which led to a split sign_extend being created.
We probably also need to custom type legalize (v2i32 (sext v2i1)) via widening.
llvm-svn: 324820
|
| |
|
|
|
|
|
|
|
|
| |
upcoming patch.
The update script sometimes has trouble when there are check-prefixes representing every possible combination of feature flags. I have a patch where the update script was generating something that didn't pass lit.
This patch just removes some check-prefixes and expands out some of the checks to workaround this.
llvm-svn: 324819
|
| |
|
|
|
|
| |
Ensure the scalar is correctly splatted to all lanes
llvm-svn: 324818
|
| |
|
|
|
|
|
| |
D43141 proposes to correct undef folding in the DAG,
and this test would not survive that change.
llvm-svn: 324817
|
| |
|
|
|
|
|
| |
D43141 proposes to correct undef folding in the DAG,
and this test would not survive that change.
llvm-svn: 324816
|
| |
|
|
| |
llvm-svn: 324815
|
| |
|
|
|
|
|
| |
D43141 proposes to correct undef folding in the DAG,
and this test would not survive that change.
llvm-svn: 324814
|
| |
|
|
| |
llvm-svn: 324813
|
| |
|
|
| |
llvm-svn: 324812
|
| |
|
|
|
|
| |
Noted by David CARLIER.
llvm-svn: 324811
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
textdomain is a part of -lintl on BSDs. In GLIBC it's in libc.
We assume that -lintl will need to be rebuilt with sanitizers
in order to sanitize programs using its features.
This is a proper continuation of D41013.
The original patch has been reverted (adding -lintl).
llvm-svn: 324810
|
| |
|
|
| |
llvm-svn: 324809
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a base-class called TemplateInstantiationObserver which gets
notified whenever a template instantiation is entered or exited during
semantic analysis. This is a base class used to implement the template
profiling and debugging tool called
Templight (https://github.com/mikael-s-persson/templight).
The patch also makes a few more changes:
* ActiveTemplateInstantiation class is moved out of the Sema class (so it can be used with inclusion of Sema.h).
* CreateFrontendAction function in front-end utilities is given external linkage (not longer a hidden static function).
* TemplateInstObserverChain data member added to Sema class to hold the list of template-inst observers.
* Notifications to the template-inst observer are added at the key places where templates are instantiated.
Patch by: Abel Sinkovics!
Differential Revision: https://reviews.llvm.org/D5767
llvm-svn: 324808
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: some compiler (msvc) treats Buffer.Buffer as constructor and refuse to compile. NFC
Authored by comicfans44.
Reviewers: rnk, dberris
Reviewed By: dberris
Subscribers: llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D40346
llvm-svn: 324807
|
| |
|
|
|
|
| |
Strangely the code was already present, just the setOperationAction wasn't being called without VLX.
llvm-svn: 324806
|
| |
|
|
|
|
|
|
|
|
| |
extend and a shift.
This avoids a constant pool load to create 1.
The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization.
llvm-svn: 324805
|
| |
|
|
|
|
|
|
| |
then masking. A later DAG combine will convert to a shift.
This helps to avoid a constant pool load needed to zero extend from the mask.
llvm-svn: 324804
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This revision refactors 1. parser 2. CHECK line adder of utils/update_{,llc_}test_checks.py
so that thir functionality can be re-used by other utility scripts (e.g. D42712)
Reviewers: asb, craig.topper, RKSimon, echristo
Subscribers: llvm-commits, spatel
Differential Revision: https://reviews.llvm.org/D42805
llvm-svn: 324803
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Massive false positives were known to be caused by continuing the analysis
after a destructor with a noreturn attribute has been executed in the program
but not modeled in the analyzer due to being missing in the CFG.
Now that work is being done on enabling the modeling of temporary constructors
and destructors in the CFG, we need to make sure that the heuristic that
suppresses these false positives keeps working when such modeling is disabled.
In particular, different code paths open up when the corresponding constructor
is being inlined during analysis.
Differential Revision: https://reviews.llvm.org/D42779
llvm-svn: 324802
|
| |
|
|
|
|
|
|
|
|
| |
It was introduced when two -analyzer-config options were added almost
simultaneously in r324793 and r324668 and the option count was not
rebased correctly in the tests.
Fixes the buildbots.
llvm-svn: 324801
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The analyzer was relying on peeking the next CFG element during analysis
whenever it was trying to figure out what object is being constructed
by a given constructor. This information is now available in the current CFG
element in all cases that were previously supported by the analyzer,
so no complicated lookahead is necessary anymore.
No functional change intended - the context in the CFG should for now be
available if and only if it was previously discoverable via CFG lookahead.
Differential Revision: https://reviews.llvm.org/D42721
llvm-svn: 324800
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch from ngolovliov@gmail.com
Reviewed as: https://reviews.llvm.org/D42344
As described in llvm.org/PR30959, the current
implementation of std::{map, key}::{count, equal_range} in libcxx is
non-conforming. Quoting the C++14 standard [associative.reqmts]p3
> The phrase “equivalence of keys” means the equivalence relation imposed by
> the comparison and not the operator== on keys. That is, two keys k1 and k2 are
> considered to be equivalent if for the comparison object comp,
> comp(k1, k2) == false && comp(k2, k1) == false.
In the same section, the requirements table states the following:
> a.equal_range(k) equivalent to make_pair(a.lower_bound(k), a.upper_bound(k))
> a.count(k) returns the number of elements with key equivalent to k
The behaviour of libstdc++ seems to conform to the standard here.
llvm-svn: 324799
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we make it possible to query the CFG constructor element to find
information about the construction site, possible cleanup work represented by
ExprWithCleanups should not prevent us from providing this information.
This allows us to have a correct construction context for variables initialized
"by value" via elidable copy-constructors, such as 'i' in
iterator i = vector.begin();
Differential Revision: https://reviews.llvm.org/D42719
llvm-svn: 324798
|
| |
|
|
|
|
|
| |
All uses conservatively assume in early exit case that it will be a
predecessor. Changing default removes checking code in all uses.
llvm-svn: 324797
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFG elements for constructors of fields and base classes that are being
initialized before the body of the whole-class constructor starts can now be
queried to discover that they're indeed participating in initialization of their
respective fields or bases before the whole-class constructor kicks in.
CFG construction contexts are now capable of representing CXXCtorInitializer
triggers, which aren't considered to be statements in the Clang AST.
Differential Revision: https://reviews.llvm.org/D42700
llvm-svn: 324796
|
| |
|
|
|
|
|
|
|
| |
arch incompat with spec in file so it's rejected and the test fails.
will look into this later, will be a test case issue not a test issue;
test case may only be valid when lldb is built for/running on an x86_64
system.
llvm-svn: 324795
|