Throwing my [r] answer in the hat, optimized for speed and works with any length of x (unlike the asker's which was hard coded for length 20):
### data set.seed(100)x <- round(rnorm(20, sd = 0.02), 3)### solutionsummation <- c(x[1])enn <- 1n_of_seq <- c(enn)for(i in 2:length(x)){ first <- x[i] second <- summation[i - 1] if(sign(first) == sign(second)){ summation <- c(summation, first + second) enn <- enn + 1 }else{ summation <- c(summation, first) enn <- 1 } n_of_seq <- c(n_of_seq, enn) }
And, to compare run times on my current (very slow) work computer, here's the output of my microbenchmark using all of the R solutions in this thread. Unsurprisingly, the solutions making the most copies and conversions tended to be slower.
Unit: microseconds expr min lq mean median uq max neval my_way() 13.301 19.200 23.38352 21.4010 23.401 20604.0 1e+05 author_way() 19.702 31.701 40.12371 36.0015 40.502 24393.9 1e+05 ronak() 856.401 1113.601 1305.36419 1236.8010 1377.501 453191.4 1e+05 ameer() 388.501 452.002 553.08263 491.3000 548.701 456156.6 1e+05 andrew() 2007.801 2336.801 2748.57713 2518.1510 2760.302 463175.8 1e+05 gonzo() 21.901 35.502 48.84946 43.9010 51.001 29519.5 1e+05
--------------EDIT--------------It was pointed out by @nicola that my solution is not the fastest for longer lengths of x - which should be fairly obvious since I'm continually making copies of vectors by using calls like x <- c(x, y). I only created the fastest solution for lengths = 20 and just microbenchmarked as low as I could go for that.
To make a fairer comparison I edited all versions to generate the original code in the way I believe would be fastest, but I welcome feedback on that. Here is my full benchmarking code and results for my very slow system. I welcome any feedback.
# originally benchmarked a few different lengthsfor(pie in c(100000)){my_way<- function(){ set.seed(100) x <- round(rnorm(pie, sd = 0.02), 3)summation <- c(x[1])enn <- 1n_of_seq <- c(enn)for(i in 2:length(x)){ first <- x[i] second <- summation[i - 1] if(sign(first) == sign(second)){ summation <- c(summation, first + second) enn <- enn + 1 }else{ summation <- c(summation, first) enn <- 1 } n_of_seq <- c(n_of_seq, enn) }# print(summation)}author_way <- function(){ set.seed(100) x <- round(rnorm(pie, sd = 0.02), 3) sign_indicator <- ifelse(x > 0, 1,-1) sky <- length(x) number_of_sequence <- rep(NA, sky) n <- 1 for (i in 2:sky) { if (sign_indicator[i] == sign_indicator[i - 1]) { n <- n + 1 } else{ n <- 1 } number_of_sequence[i] <- n } number_of_sequence[1] <- 1 ############################# summation <- rep(NA, sky) for (i in 1:sky) { summation[i] <- sum(x[i:(i + 1 - number_of_sequence[i])]) }}# other ppls solutions:ronak <- function(){df <- data.table('x' = round(rnorm(pie, sd = 0.02), 3))df[, c("n_of_sequence", "sum") := list(seq_len(.N), cumsum(x)),rleid(sign(x))]}ameer <- function(){ set.seed(100) x <- round(rnorm(pie, sd = 0.02), 3) run_lengths <- rle(sign(x))$lengths n_of_sequence <- run_lengths %>% map(seq) %>% unlist start <- cumsum(c(1,run_lengths)) start <- start[-length(start)] # start points of each series map2(start,run_lengths,~cumsum(x[.x:(.x+.y-1)])) %>% unlist()}count_and_sum <- function(x){ set.seed(100) x <- round(rnorm(pie, sd = 0.02), 3) runs <- rle((x > 0) * 1)$lengths groups <- split(x, rep(1:length(runs), runs)) output <- function(group) data.frame(x = group, n = seq_along(group), sum = cumsum(group)) result <- as.data.frame(do.call(rbind, lapply(groups, output))) `rownames<-`(result, 1:nrow(result))}andrew <- function(){ set.seed(100) df <- tibble(x = round(rnorm(pie, sd = 0.02), 3)) %>% mutate(seqno = cumsum(c(1, diff(sign(x)) != 0))) %>% #identify sequence ids group_by(seqno) %>% #group by sequences mutate(n_of_sequence = row_number(), #count row numbers for each group sum = cumsum(x)) %>% #cumulative sum for each group ungroup() %>% select(-seqno) }gonzo <- function(){ set.seed(100) x <- round(rnorm(pie, sd = 0.02), 3) n_of_sequence <- runner::streak_run(x > 0) sum <- runner::sum_run(x, k = n_of_sequence)}mi1 <- microbenchmark(my_way(), author_way(), ronak(), ameer(), andrew(), gonzo(), times = 10)print(mi1)}
As these results show, for other lengths than what I optimized for, my version is slow. The longer x is, the slower it gets up to ridiculously slow at everything above 1000. My favorite version is Ronak's which is only the second fastest on my system. GoGonzo is the fastest on my machine by far at these longer lengths.
Unit: milliseconds expr min lq mean median uq max neval my_way() 21276.9027 21428.2694 21604.30191 21581.97970 21806.9543 21896.7105 10 author_way() 82.2465 83.0873 89.42343 84.78315 85.3638 115.4550 10 ronak() 68.3922 69.3067 70.41924 69.84625 71.3509 74.7070 10 ameer() 481.4566 509.7552 521.19034 514.77000 530.1121 579.4707 10 andrew() 200.9654 202.1898 210.84914 206.20465 211.2006 233.7618 10 gonzo() 27.3317 28.2550 28.66679 28.50535 28.9104 29.9549 10