Rank groups within a grouped sequence of TRUE/FALSE and NA





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







9















I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!










share|improve this question




















  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    19 mins ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    15 mins ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    15 mins ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    13 mins ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    12 mins ago




















9















I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!










share|improve this question




















  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    19 mins ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    15 mins ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    15 mins ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    13 mins ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    12 mins ago
















9












9








9








I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!










share|improve this question
















I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!







r dplyr data.table rank






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 mins ago







Humpelstielzchen

















asked 2 hours ago









HumpelstielzchenHumpelstielzchen

1,3521317




1,3521317








  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    19 mins ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    15 mins ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    15 mins ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    13 mins ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    12 mins ago
















  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    19 mins ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    15 mins ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    15 mins ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    13 mins ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    12 mins ago










1




1





you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 mins ago







you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 mins ago















that is a really funny solution. Very good job!

– Humpelstielzchen
15 mins ago





that is a really funny solution. Very good job!

– Humpelstielzchen
15 mins ago













In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
15 mins ago





In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
15 mins ago













No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
13 mins ago







No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
13 mins ago















But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
12 mins ago







But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
12 mins ago














4 Answers
4






active

oldest

votes


















5














Another data.table approach:



library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]





share|improve this answer



















  • 1





    Tried with rleid but didn't get it to work. (+1)

    – markus
    37 mins ago











  • works for me. And seems to be the most elegant answer.

    – Humpelstielzchen
    33 mins ago



















4














Maybe I have over-complicated this but one way with dplyr is



library(dplyr)

df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)

# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA


We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






share|improve this answer































    3














    A data.table option using rle



    library(data.table)
    DT <- as.data.table(dat)
    DT[, goal := {
    r <- rle(replace(criterium, is.na(criterium), FALSE))
    r$values <- with(r, cumsum(values) * values)
    out <- inverse.rle(r)
    replace(out, out == 0, NA)
    }, by = group]
    DT
    # group criterium goal
    # 1: A NA NA
    # 2: A TRUE 1
    # 3: A TRUE 1
    # 4: A TRUE 1
    # 5: A FALSE NA
    # 6: A FALSE NA
    # 7: A TRUE 2
    # 8: A TRUE 2
    # 9: A FALSE NA
    #10: A TRUE 3
    #11: A TRUE 3
    #12: A TRUE 3
    #13: B NA NA
    #14: B FALSE NA
    #15: B TRUE 1
    #16: B TRUE 1
    #17: B TRUE 1
    #18: B FALSE NA


    step by step



    When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



    r
    #Run Length Encoding
    # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
    # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


    We manipulate the values compenent in the following way



    r$values <- with(r, cumsum(values) * values)
    r
    #Run Length Encoding
    # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
    # values : int [1:9] 0 1 0 2 0 3 0 4 0


    That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times



    inverse.rle(r)
    # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


    This is almost what OP want, only we need to replace the 0s with NA.



    This is done for each group.



    data



    dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
    "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
    FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
    TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
    -18L))





    share|improve this answer


























    • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

      – Humpelstielzchen
      1 hour ago






    • 1





      @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

      – markus
      1 hour ago













    • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

      – Humpelstielzchen
      29 mins ago



















    2














    We can create a custom function via rle, and use it per group, i.e.



    f1 <- function(x) {
    x[is.na(x)] <- FALSE
    rle1 <- rle(x)
    y <- rle1$values
    rle1$values[!y] <- 0
    rle1$values[y] <- cumsum(rle1$values[y])
    return(inverse.rle(rle1))
    }


    do.call(rbind,
    lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
    i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
    i}))


    Of course, If you want you can apply it via dplyr, i.e.



    library(dplyr)

    df %>%
    group_by(group) %>%
    mutate(goal = f1(criterium),
    goal = replace(goal, is.na(criterium)|!criterium, NA))


    which gives,




    # A tibble: 18 x 3
    # Groups: group [2]
    group criterium goal
    <fct> <lgl> <dbl>
    1 A NA NA
    2 A TRUE 1
    3 A TRUE 1
    4 A TRUE 1
    5 A FALSE NA
    6 A FALSE NA
    7 A TRUE 2
    8 A TRUE 2
    9 A FALSE NA
    10 A TRUE 3
    11 A TRUE 3
    12 A TRUE 3
    13 B NA NA
    14 B FALSE NA
    15 B TRUE 1
    16 B TRUE 1
    17 B TRUE 1
    18 B FALSE NA






    share|improve this answer


























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      5














      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]





      share|improve this answer



















      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        37 mins ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        33 mins ago
















      5














      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]





      share|improve this answer



















      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        37 mins ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        33 mins ago














      5












      5








      5







      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]





      share|improve this answer













      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 45 mins ago









      chinsoon12chinsoon12

      9,87611420




      9,87611420








      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        37 mins ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        33 mins ago














      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        37 mins ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        33 mins ago








      1




      1





      Tried with rleid but didn't get it to work. (+1)

      – markus
      37 mins ago





      Tried with rleid but didn't get it to work. (+1)

      – markus
      37 mins ago













      works for me. And seems to be the most elegant answer.

      – Humpelstielzchen
      33 mins ago





      works for me. And seems to be the most elegant answer.

      – Humpelstielzchen
      33 mins ago













      4














      Maybe I have over-complicated this but one way with dplyr is



      library(dplyr)

      df %>%
      mutate(temp = replace(criterium, is.na(criterium), FALSE),
      temp1 = cumsum(!temp)) %>%
      group_by(temp1) %>%
      mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
      group_by(group) %>%
      mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
      select(-temp, -temp1)

      # group criterium goal
      # <fct> <lgl> <int>
      # 1 A NA NA
      # 2 A TRUE 1
      # 3 A TRUE 1
      # 4 A TRUE 1
      # 5 A FALSE NA
      # 6 A FALSE NA
      # 7 A TRUE 2
      # 8 A TRUE 2
      # 9 A FALSE NA
      #10 A TRUE 3
      #11 A TRUE 3
      #12 A TRUE 3
      #13 B NA NA
      #14 B FALSE NA
      #15 B TRUE 1
      #16 B TRUE 1
      #17 B TRUE 1
      #18 B FALSE NA


      We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






      share|improve this answer




























        4














        Maybe I have over-complicated this but one way with dplyr is



        library(dplyr)

        df %>%
        mutate(temp = replace(criterium, is.na(criterium), FALSE),
        temp1 = cumsum(!temp)) %>%
        group_by(temp1) %>%
        mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
        group_by(group) %>%
        mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
        select(-temp, -temp1)

        # group criterium goal
        # <fct> <lgl> <int>
        # 1 A NA NA
        # 2 A TRUE 1
        # 3 A TRUE 1
        # 4 A TRUE 1
        # 5 A FALSE NA
        # 6 A FALSE NA
        # 7 A TRUE 2
        # 8 A TRUE 2
        # 9 A FALSE NA
        #10 A TRUE 3
        #11 A TRUE 3
        #12 A TRUE 3
        #13 B NA NA
        #14 B FALSE NA
        #15 B TRUE 1
        #16 B TRUE 1
        #17 B TRUE 1
        #18 B FALSE NA


        We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






        share|improve this answer


























          4












          4








          4







          Maybe I have over-complicated this but one way with dplyr is



          library(dplyr)

          df %>%
          mutate(temp = replace(criterium, is.na(criterium), FALSE),
          temp1 = cumsum(!temp)) %>%
          group_by(temp1) %>%
          mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
          group_by(group) %>%
          mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
          select(-temp, -temp1)

          # group criterium goal
          # <fct> <lgl> <int>
          # 1 A NA NA
          # 2 A TRUE 1
          # 3 A TRUE 1
          # 4 A TRUE 1
          # 5 A FALSE NA
          # 6 A FALSE NA
          # 7 A TRUE 2
          # 8 A TRUE 2
          # 9 A FALSE NA
          #10 A TRUE 3
          #11 A TRUE 3
          #12 A TRUE 3
          #13 B NA NA
          #14 B FALSE NA
          #15 B TRUE 1
          #16 B TRUE 1
          #17 B TRUE 1
          #18 B FALSE NA


          We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






          share|improve this answer













          Maybe I have over-complicated this but one way with dplyr is



          library(dplyr)

          df %>%
          mutate(temp = replace(criterium, is.na(criterium), FALSE),
          temp1 = cumsum(!temp)) %>%
          group_by(temp1) %>%
          mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
          group_by(group) %>%
          mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
          select(-temp, -temp1)

          # group criterium goal
          # <fct> <lgl> <int>
          # 1 A NA NA
          # 2 A TRUE 1
          # 3 A TRUE 1
          # 4 A TRUE 1
          # 5 A FALSE NA
          # 6 A FALSE NA
          # 7 A TRUE 2
          # 8 A TRUE 2
          # 9 A FALSE NA
          #10 A TRUE 3
          #11 A TRUE 3
          #12 A TRUE 3
          #13 B NA NA
          #14 B FALSE NA
          #15 B TRUE 1
          #16 B TRUE 1
          #17 B TRUE 1
          #18 B FALSE NA


          We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 1 hour ago









          Ronak ShahRonak Shah

          45.9k104268




          45.9k104268























              3














              A data.table option using rle



              library(data.table)
              DT <- as.data.table(dat)
              DT[, goal := {
              r <- rle(replace(criterium, is.na(criterium), FALSE))
              r$values <- with(r, cumsum(values) * values)
              out <- inverse.rle(r)
              replace(out, out == 0, NA)
              }, by = group]
              DT
              # group criterium goal
              # 1: A NA NA
              # 2: A TRUE 1
              # 3: A TRUE 1
              # 4: A TRUE 1
              # 5: A FALSE NA
              # 6: A FALSE NA
              # 7: A TRUE 2
              # 8: A TRUE 2
              # 9: A FALSE NA
              #10: A TRUE 3
              #11: A TRUE 3
              #12: A TRUE 3
              #13: B NA NA
              #14: B FALSE NA
              #15: B TRUE 1
              #16: B TRUE 1
              #17: B TRUE 1
              #18: B FALSE NA


              step by step



              When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


              We manipulate the values compenent in the following way



              r$values <- with(r, cumsum(values) * values)
              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : int [1:9] 0 1 0 2 0 3 0 4 0


              That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times



              inverse.rle(r)
              # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


              This is almost what OP want, only we need to replace the 0s with NA.



              This is done for each group.



              data



              dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
              1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
              "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
              FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
              TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
              -18L))





              share|improve this answer


























              • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                – Humpelstielzchen
                1 hour ago






              • 1





                @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                – markus
                1 hour ago













              • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                – Humpelstielzchen
                29 mins ago
















              3














              A data.table option using rle



              library(data.table)
              DT <- as.data.table(dat)
              DT[, goal := {
              r <- rle(replace(criterium, is.na(criterium), FALSE))
              r$values <- with(r, cumsum(values) * values)
              out <- inverse.rle(r)
              replace(out, out == 0, NA)
              }, by = group]
              DT
              # group criterium goal
              # 1: A NA NA
              # 2: A TRUE 1
              # 3: A TRUE 1
              # 4: A TRUE 1
              # 5: A FALSE NA
              # 6: A FALSE NA
              # 7: A TRUE 2
              # 8: A TRUE 2
              # 9: A FALSE NA
              #10: A TRUE 3
              #11: A TRUE 3
              #12: A TRUE 3
              #13: B NA NA
              #14: B FALSE NA
              #15: B TRUE 1
              #16: B TRUE 1
              #17: B TRUE 1
              #18: B FALSE NA


              step by step



              When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


              We manipulate the values compenent in the following way



              r$values <- with(r, cumsum(values) * values)
              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : int [1:9] 0 1 0 2 0 3 0 4 0


              That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times



              inverse.rle(r)
              # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


              This is almost what OP want, only we need to replace the 0s with NA.



              This is done for each group.



              data



              dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
              1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
              "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
              FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
              TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
              -18L))





              share|improve this answer


























              • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                – Humpelstielzchen
                1 hour ago






              • 1





                @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                – markus
                1 hour ago













              • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                – Humpelstielzchen
                29 mins ago














              3












              3








              3







              A data.table option using rle



              library(data.table)
              DT <- as.data.table(dat)
              DT[, goal := {
              r <- rle(replace(criterium, is.na(criterium), FALSE))
              r$values <- with(r, cumsum(values) * values)
              out <- inverse.rle(r)
              replace(out, out == 0, NA)
              }, by = group]
              DT
              # group criterium goal
              # 1: A NA NA
              # 2: A TRUE 1
              # 3: A TRUE 1
              # 4: A TRUE 1
              # 5: A FALSE NA
              # 6: A FALSE NA
              # 7: A TRUE 2
              # 8: A TRUE 2
              # 9: A FALSE NA
              #10: A TRUE 3
              #11: A TRUE 3
              #12: A TRUE 3
              #13: B NA NA
              #14: B FALSE NA
              #15: B TRUE 1
              #16: B TRUE 1
              #17: B TRUE 1
              #18: B FALSE NA


              step by step



              When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


              We manipulate the values compenent in the following way



              r$values <- with(r, cumsum(values) * values)
              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : int [1:9] 0 1 0 2 0 3 0 4 0


              That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times



              inverse.rle(r)
              # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


              This is almost what OP want, only we need to replace the 0s with NA.



              This is done for each group.



              data



              dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
              1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
              "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
              FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
              TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
              -18L))





              share|improve this answer















              A data.table option using rle



              library(data.table)
              DT <- as.data.table(dat)
              DT[, goal := {
              r <- rle(replace(criterium, is.na(criterium), FALSE))
              r$values <- with(r, cumsum(values) * values)
              out <- inverse.rle(r)
              replace(out, out == 0, NA)
              }, by = group]
              DT
              # group criterium goal
              # 1: A NA NA
              # 2: A TRUE 1
              # 3: A TRUE 1
              # 4: A TRUE 1
              # 5: A FALSE NA
              # 6: A FALSE NA
              # 7: A TRUE 2
              # 8: A TRUE 2
              # 9: A FALSE NA
              #10: A TRUE 3
              #11: A TRUE 3
              #12: A TRUE 3
              #13: B NA NA
              #14: B FALSE NA
              #15: B TRUE 1
              #16: B TRUE 1
              #17: B TRUE 1
              #18: B FALSE NA


              step by step



              When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


              We manipulate the values compenent in the following way



              r$values <- with(r, cumsum(values) * values)
              r
              #Run Length Encoding
              # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
              # values : int [1:9] 0 1 0 2 0 3 0 4 0


              That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times



              inverse.rle(r)
              # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


              This is almost what OP want, only we need to replace the 0s with NA.



              This is done for each group.



              data



              dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
              1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
              "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
              FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
              TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
              -18L))






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 41 mins ago

























              answered 1 hour ago









              markusmarkus

              15.4k11336




              15.4k11336













              • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                – Humpelstielzchen
                1 hour ago






              • 1





                @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                – markus
                1 hour ago













              • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                – Humpelstielzchen
                29 mins ago



















              • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                – Humpelstielzchen
                1 hour ago






              • 1





                @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                – markus
                1 hour ago













              • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                – Humpelstielzchen
                29 mins ago

















              Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

              – Humpelstielzchen
              1 hour ago





              Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

              – Humpelstielzchen
              1 hour ago




              1




              1





              @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

              – markus
              1 hour ago







              @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

              – markus
              1 hour ago















              Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

              – Humpelstielzchen
              29 mins ago





              Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

              – Humpelstielzchen
              29 mins ago











              2














              We can create a custom function via rle, and use it per group, i.e.



              f1 <- function(x) {
              x[is.na(x)] <- FALSE
              rle1 <- rle(x)
              y <- rle1$values
              rle1$values[!y] <- 0
              rle1$values[y] <- cumsum(rle1$values[y])
              return(inverse.rle(rle1))
              }


              do.call(rbind,
              lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
              i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
              i}))


              Of course, If you want you can apply it via dplyr, i.e.



              library(dplyr)

              df %>%
              group_by(group) %>%
              mutate(goal = f1(criterium),
              goal = replace(goal, is.na(criterium)|!criterium, NA))


              which gives,




              # A tibble: 18 x 3
              # Groups: group [2]
              group criterium goal
              <fct> <lgl> <dbl>
              1 A NA NA
              2 A TRUE 1
              3 A TRUE 1
              4 A TRUE 1
              5 A FALSE NA
              6 A FALSE NA
              7 A TRUE 2
              8 A TRUE 2
              9 A FALSE NA
              10 A TRUE 3
              11 A TRUE 3
              12 A TRUE 3
              13 B NA NA
              14 B FALSE NA
              15 B TRUE 1
              16 B TRUE 1
              17 B TRUE 1
              18 B FALSE NA






              share|improve this answer






























                2














                We can create a custom function via rle, and use it per group, i.e.



                f1 <- function(x) {
                x[is.na(x)] <- FALSE
                rle1 <- rle(x)
                y <- rle1$values
                rle1$values[!y] <- 0
                rle1$values[y] <- cumsum(rle1$values[y])
                return(inverse.rle(rle1))
                }


                do.call(rbind,
                lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
                i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
                i}))


                Of course, If you want you can apply it via dplyr, i.e.



                library(dplyr)

                df %>%
                group_by(group) %>%
                mutate(goal = f1(criterium),
                goal = replace(goal, is.na(criterium)|!criterium, NA))


                which gives,




                # A tibble: 18 x 3
                # Groups: group [2]
                group criterium goal
                <fct> <lgl> <dbl>
                1 A NA NA
                2 A TRUE 1
                3 A TRUE 1
                4 A TRUE 1
                5 A FALSE NA
                6 A FALSE NA
                7 A TRUE 2
                8 A TRUE 2
                9 A FALSE NA
                10 A TRUE 3
                11 A TRUE 3
                12 A TRUE 3
                13 B NA NA
                14 B FALSE NA
                15 B TRUE 1
                16 B TRUE 1
                17 B TRUE 1
                18 B FALSE NA






                share|improve this answer




























                  2












                  2








                  2







                  We can create a custom function via rle, and use it per group, i.e.



                  f1 <- function(x) {
                  x[is.na(x)] <- FALSE
                  rle1 <- rle(x)
                  y <- rle1$values
                  rle1$values[!y] <- 0
                  rle1$values[y] <- cumsum(rle1$values[y])
                  return(inverse.rle(rle1))
                  }


                  do.call(rbind,
                  lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
                  i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
                  i}))


                  Of course, If you want you can apply it via dplyr, i.e.



                  library(dplyr)

                  df %>%
                  group_by(group) %>%
                  mutate(goal = f1(criterium),
                  goal = replace(goal, is.na(criterium)|!criterium, NA))


                  which gives,




                  # A tibble: 18 x 3
                  # Groups: group [2]
                  group criterium goal
                  <fct> <lgl> <dbl>
                  1 A NA NA
                  2 A TRUE 1
                  3 A TRUE 1
                  4 A TRUE 1
                  5 A FALSE NA
                  6 A FALSE NA
                  7 A TRUE 2
                  8 A TRUE 2
                  9 A FALSE NA
                  10 A TRUE 3
                  11 A TRUE 3
                  12 A TRUE 3
                  13 B NA NA
                  14 B FALSE NA
                  15 B TRUE 1
                  16 B TRUE 1
                  17 B TRUE 1
                  18 B FALSE NA






                  share|improve this answer















                  We can create a custom function via rle, and use it per group, i.e.



                  f1 <- function(x) {
                  x[is.na(x)] <- FALSE
                  rle1 <- rle(x)
                  y <- rle1$values
                  rle1$values[!y] <- 0
                  rle1$values[y] <- cumsum(rle1$values[y])
                  return(inverse.rle(rle1))
                  }


                  do.call(rbind,
                  lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
                  i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
                  i}))


                  Of course, If you want you can apply it via dplyr, i.e.



                  library(dplyr)

                  df %>%
                  group_by(group) %>%
                  mutate(goal = f1(criterium),
                  goal = replace(goal, is.na(criterium)|!criterium, NA))


                  which gives,




                  # A tibble: 18 x 3
                  # Groups: group [2]
                  group criterium goal
                  <fct> <lgl> <dbl>
                  1 A NA NA
                  2 A TRUE 1
                  3 A TRUE 1
                  4 A TRUE 1
                  5 A FALSE NA
                  6 A FALSE NA
                  7 A TRUE 2
                  8 A TRUE 2
                  9 A FALSE NA
                  10 A TRUE 3
                  11 A TRUE 3
                  12 A TRUE 3
                  13 B NA NA
                  14 B FALSE NA
                  15 B TRUE 1
                  16 B TRUE 1
                  17 B TRUE 1
                  18 B FALSE NA







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 1 hour ago

























                  answered 1 hour ago









                  SotosSotos

                  31.3k51741




                  31.3k51741






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Усть-Каменогорск

                      Халкинская богословская школа

                      Where does the word Sparryheid come from and mean?