Quantcast
Channel: Count and summation of positive and negative number sequences - Stack Overflow
Viewing all articles
Browse latest Browse all 15

Answer by Adverse_Event for Count and summation of positive and negative number sequences

$
0
0

Throwing my [r] answer in the hat, optimized for speed and works with any length of x (unlike the asker's which was hard coded for length 20):

### data set.seed(100)x <- round(rnorm(20, sd = 0.02), 3)### solutionsummation <- c(x[1])enn <- 1n_of_seq <- c(enn)for(i in 2:length(x)){  first <- x[i]  second <- summation[i - 1]  if(sign(first) == sign(second)){    summation <- c(summation, first + second)    enn <- enn + 1  }else{    summation <- c(summation, first)    enn <- 1  }  n_of_seq <- c(n_of_seq, enn)  }

And, to compare run times on my current (very slow) work computer, here's the output of my microbenchmark using all of the R solutions in this thread. Unsurprisingly, the solutions making the most copies and conversions tended to be slower.

Unit: microseconds         expr      min       lq       mean    median       uq      max neval     my_way()   13.301   19.200   23.38352   21.4010   23.401  20604.0 1e+05 author_way()   19.702   31.701   40.12371   36.0015   40.502  24393.9 1e+05      ronak()  856.401 1113.601 1305.36419 1236.8010 1377.501 453191.4 1e+05      ameer()  388.501  452.002  553.08263  491.3000  548.701 456156.6 1e+05     andrew() 2007.801 2336.801 2748.57713 2518.1510 2760.302 463175.8 1e+05      gonzo()   21.901   35.502   48.84946   43.9010   51.001  29519.5 1e+05

--------------EDIT--------------It was pointed out by @nicola that my solution is not the fastest for longer lengths of x - which should be fairly obvious since I'm continually making copies of vectors by using calls like x <- c(x, y). I only created the fastest solution for lengths = 20 and just microbenchmarked as low as I could go for that.

To make a fairer comparison I edited all versions to generate the original code in the way I believe would be fastest, but I welcome feedback on that. Here is my full benchmarking code and results for my very slow system. I welcome any feedback.

# originally benchmarked a few different lengthsfor(pie in c(100000)){my_way<- function(){  set.seed(100)  x <- round(rnorm(pie, sd = 0.02), 3)summation <- c(x[1])enn <- 1n_of_seq <- c(enn)for(i in 2:length(x)){  first <- x[i]  second <- summation[i - 1]  if(sign(first) == sign(second)){    summation <- c(summation, first + second)    enn <- enn + 1  }else{    summation <- c(summation, first)    enn <- 1  }  n_of_seq <- c(n_of_seq, enn)  }# print(summation)}author_way <- function(){  set.seed(100)  x <- round(rnorm(pie, sd = 0.02), 3)  sign_indicator <- ifelse(x > 0, 1,-1)  sky <- length(x)  number_of_sequence <- rep(NA, sky)  n <- 1  for (i in 2:sky) {    if (sign_indicator[i] == sign_indicator[i - 1]) {      n <- n + 1    } else{      n <- 1    }    number_of_sequence[i] <- n  }  number_of_sequence[1] <- 1  #############################  summation <- rep(NA, sky)  for (i in 1:sky) {    summation[i] <- sum(x[i:(i + 1 - number_of_sequence[i])])  }}# other ppls solutions:ronak <- function(){df <- data.table('x' = round(rnorm(pie, sd = 0.02), 3))df[, c("n_of_sequence", "sum") := list(seq_len(.N), cumsum(x)),rleid(sign(x))]}ameer <- function(){  set.seed(100)  x <- round(rnorm(pie, sd = 0.02), 3)  run_lengths <- rle(sign(x))$lengths  n_of_sequence <- run_lengths %>% map(seq) %>% unlist  start <- cumsum(c(1,run_lengths))  start <- start[-length(start)] # start points of each series   map2(start,run_lengths,~cumsum(x[.x:(.x+.y-1)])) %>% unlist()}count_and_sum <- function(x){  set.seed(100)  x <- round(rnorm(pie, sd = 0.02), 3)  runs   <- rle((x > 0) * 1)$lengths  groups <- split(x, rep(1:length(runs), runs))  output <- function(group) data.frame(x = group, n = seq_along(group), sum = cumsum(group))  result <- as.data.frame(do.call(rbind, lapply(groups, output)))  `rownames<-`(result, 1:nrow(result))}andrew <- function(){  set.seed(100)  df <- tibble(x = round(rnorm(pie, sd = 0.02), 3)) %>%     mutate(seqno = cumsum(c(1, diff(sign(x)) != 0))) %>% #identify sequence ids    group_by(seqno) %>%                                  #group by sequences    mutate(n_of_sequence = row_number(),                 #count row numbers for each group           sum = cumsum(x)) %>%                          #cumulative sum for each group    ungroup() %>%     select(-seqno) }gonzo <- function(){  set.seed(100)  x <- round(rnorm(pie, sd = 0.02), 3)  n_of_sequence <- runner::streak_run(x > 0)  sum <- runner::sum_run(x, k = n_of_sequence)}mi1 <- microbenchmark(my_way(), author_way(), ronak(), ameer(), andrew(), gonzo(), times = 10)print(mi1)}

As these results show, for other lengths than what I optimized for, my version is slow. The longer x is, the slower it gets up to ridiculously slow at everything above 1000. My favorite version is Ronak's which is only the second fastest on my system. GoGonzo is the fastest on my machine by far at these longer lengths.

Unit: milliseconds         expr        min         lq        mean      median         uq        max neval     my_way() 21276.9027 21428.2694 21604.30191 21581.97970 21806.9543 21896.7105    10 author_way()    82.2465    83.0873    89.42343    84.78315    85.3638   115.4550    10      ronak()    68.3922    69.3067    70.41924    69.84625    71.3509    74.7070    10      ameer()   481.4566   509.7552   521.19034   514.77000   530.1121   579.4707    10     andrew()   200.9654   202.1898   210.84914   206.20465   211.2006   233.7618    10      gonzo()    27.3317    28.2550    28.66679    28.50535    28.9104    29.9549    10

Viewing all articles
Browse latest Browse all 15

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>