Rank groups within a grouped sequence of TRUE/FALSE and NA

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 2 mins ago

asked 2 hours ago

Humpelstielzchen

1,3521317

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 mins ago

that is a really funny solution. Very good job!

– Humpelstielzchen
15 mins ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
15 mins ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
13 mins ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
12 mins ago

|
show 1 more comment

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 2 mins ago

asked 2 hours ago

Humpelstielzchen

1,3521317

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 mins ago

that is a really funny solution. Very good job!

– Humpelstielzchen
15 mins ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
15 mins ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
13 mins ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
12 mins ago

|
show 1 more comment

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 2 mins ago

asked 2 hours ago

Humpelstielzchen

1,3521317

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

r dplyr data.table rank

edited 2 mins ago

asked 2 hours ago

Humpelstielzchen

1,3521317

edited 2 mins ago

asked 2 hours ago

Humpelstielzchen

1,3521317

edited 2 mins ago

asked 2 hours ago

Humpelstielzchen

1,3521317

asked 2 hours ago

Humpelstielzchen

1,3521317

asked 2 hours ago

Humpelstielzchen

1,3521317

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 mins ago

that is a really funny solution. Very good job!

– Humpelstielzchen
15 mins ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
15 mins ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
13 mins ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
12 mins ago

|
show 1 more comment

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 mins ago

that is a really funny solution. Very good job!

– Humpelstielzchen
15 mins ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
15 mins ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
13 mins ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
12 mins ago

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 mins ago

that is a really funny solution. Very good job!

– Humpelstielzchen
15 mins ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
15 mins ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
13 mins ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
12 mins ago

|
show 1 more comment

4 Answers
4

active

oldest

votes

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 45 mins ago

chinsoon12

9,87611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
37 mins ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
33 mins ago

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.

answered 1 hour ago

Ronak Shah

45.9k104268

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times

inverse.rle(r)

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP want, only we need to replace the 0s with NA.

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 41 mins ago

answered 1 hour ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
1 hour ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
1 hour ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
29 mins ago

add a comment |

We can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 1 hour ago

answered 1 hour ago

Sotos

31.3k51741

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');

var $window = $(window),
onScroll = function(e) {
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom)) {
StackExchange.using('gps', function() { StackExchange.gps.track('embedded_signup_form.view', { location: 'question_page' }); });
$window.unbind('scroll', onScroll);
}
};
$window.on('scroll', onScroll);

});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 45 mins ago

chinsoon12

9,87611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
37 mins ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
33 mins ago

add a comment |

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 45 mins ago

chinsoon12

9,87611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
37 mins ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
33 mins ago

add a comment |

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 45 mins ago

chinsoon12

9,87611420

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 45 mins ago

chinsoon12

9,87611420

answered 45 mins ago

chinsoon12

9,87611420

answered 45 mins ago

chinsoon12

9,87611420

answered 45 mins ago

chinsoon12

9,87611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
37 mins ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
33 mins ago

add a comment |

1

Tried with rleid but didn't get it to work. (+1)

– markus
37 mins ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
33 mins ago

Tried with rleid but didn't get it to work. (+1)

– markus
37 mins ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
33 mins ago

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 1 hour ago

Ronak Shah

45.9k104268

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 1 hour ago

Ronak Shah

45.9k104268

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 1 hour ago

Ronak Shah

45.9k104268

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 1 hour ago

Ronak Shah

45.9k104268

answered 1 hour ago

Ronak Shah

45.9k104268

answered 1 hour ago

Ronak Shah

45.9k104268

answered 1 hour ago

Ronak Shah

45.9k104268

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times

inverse.rle(r)

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP want, only we need to replace the 0s with NA.

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 41 mins ago

answered 1 hour ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
1 hour ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
1 hour ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
29 mins ago

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times

inverse.rle(r)

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP want, only we need to replace the 0s with NA.

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 41 mins ago

answered 1 hour ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
1 hour ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
1 hour ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
29 mins ago

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times

inverse.rle(r)

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP want, only we need to replace the 0s with NA.

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 41 mins ago

answered 1 hour ago

markus

15.4k11336

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeatedlenghts` times

inverse.rle(r)

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP want, only we need to replace the 0s with NA.

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 41 mins ago

answered 1 hour ago

markus

15.4k11336

edited 41 mins ago

answered 1 hour ago

markus

15.4k11336

answered 1 hour ago

markus

15.4k11336

answered 1 hour ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
1 hour ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
1 hour ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
29 mins ago

add a comment |

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
1 hour ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
1 hour ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
29 mins ago

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
1 hour ago

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
1 hour ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
29 mins ago

add a comment |

We can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 1 hour ago

answered 1 hour ago

Sotos

31.3k51741

add a comment |

We can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 1 hour ago

answered 1 hour ago

Sotos

31.3k51741

add a comment |

We can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 1 hour ago

answered 1 hour ago

Sotos

31.3k51741

We can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 1 hour ago

answered 1 hour ago

Sotos

31.3k51741

edited 1 hour ago

answered 1 hour ago

Sotos

31.3k51741

answered 1 hour ago

Sotos

31.3k51741

answered 1 hour ago

Sotos

31.3k51741

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

Post as a guest

Name

Required, but never shown

Sign up or log in

Post as a guest

Name

Required, but never shown

Sign up or log in

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtyk