| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
function.
We are not working on a DAG and I ran into a number of problems when I enabled the vectorizations of 'diamond-trees' (trees that share leafs).
* Imroved the numbering API.
* Changed the placement of new instructions to the last root.
* Fixed a bug with external tree users with non-zero lane.
* Fixed a bug in the placement of in-tree users.
llvm-svn: 182508
|
|
|
|
|
|
| |
read in asserts.
llvm-svn: 181689
|
|
|
|
| |
llvm-svn: 181684
|
|
|
|
|
|
|
|
| |
multiple users.
The external user does not have to be in lane #0. We have to save the lane for each scalar so that we know which vector lane to extract.
llvm-svn: 181674
|
|
|
|
|
|
|
|
| |
round of vectorization.
Testcase in the next commit.
llvm-svn: 181673
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example:
bar() {
int a = A[i];
int b = A[i+1];
B[i] = a;
B[i+1] = b;
foo(a); <--- a is used outside the vectorized expression.
}
llvm-svn: 181648
|
|
|
|
| |
llvm-svn: 179975
|
|
|
|
|
|
|
|
| |
with multiple users.
We did not terminate the switch case and we executed the search routine twice.
llvm-svn: 179974
|
|
|
|
| |
llvm-svn: 179936
|
|
|
|
|
|
| |
Avoids a couple of copies and allows more flexibility in the clients.
llvm-svn: 179935
|
|
|
|
| |
llvm-svn: 179930
|
|
|
|
| |
llvm-svn: 179928
|
|
|
|
| |
llvm-svn: 179927
|
|
|
|
|
|
| |
vector-gather sequence out of loops.
llvm-svn: 179562
|
|
|
|
| |
llvm-svn: 179479
|
|
|
|
|
|
| |
and add the cost of extracting values from the roots of the tree.
llvm-svn: 179475
|
|
|
|
| |
llvm-svn: 179470
|
|
|
|
|
|
| |
perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from.
llvm-svn: 179414
|
|
|
|
| |
llvm-svn: 179206
|
|
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:
1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).
2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.
3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.
This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:
void SAXPY(int *x, int *y, int a, int i) {
x[i] = a * x[i] + y[i];
x[i+1] = a * x[i+1] + y[i+1];
x[i+2] = a * x[i+2] + y[i+2];
x[i+3] = a * x[i+3] + y[i+3];
}
llvm-svn: 179117
|