This Website Needs More Cat Pictures?

I love cats! And I love browsing cat pictures! :3

I stumbled on a webpage that shows random cat pictures every time I refresh the page. (For example, this site.) It has a database of $N$ cats and will pick any random picture in probability of $frac{1}{N}$.

I really enjoyed it. Up to the first $10$ pictures, they are all different cute pictures. However, the $11$-th picture showed the same picture as the $10$-th!

Is the database that small?!

So, what is the expected value of the number of pictures on the database (i.e. $N$)?

edited 15 hours ago

asked 19 hours ago

athin

7,86522474

2

$begingroup$
sounds like a german tank problem.
$endgroup$
– Oray
19 hours ago

2

$begingroup$
@Oray It's more like capture-recapture.
$endgroup$
– Jaap Scherphuis
18 hours ago

$begingroup$
@JaapScherphuis idd! good catch :)
$endgroup$
– Oray
18 hours ago

1

$begingroup$
@hexomino It's the expectation value of $N$.
$endgroup$
– athin
18 hours ago

1

$begingroup$
Related question (to the title).
$endgroup$
– Rand al'Thor
12 hours ago

|
show 3 more comments

I love cats! And I love browsing cat pictures! :3

I stumbled on a webpage that shows random cat pictures every time I refresh the page. (For example, this site.) It has a database of $N$ cats and will pick any random picture in probability of $frac{1}{N}$.

I really enjoyed it. Up to the first $10$ pictures, they are all different cute pictures. However, the $11$-th picture showed the same picture as the $10$-th!

Is the database that small?!

So, what is the expected value of the number of pictures on the database (i.e. $N$)?

edited 15 hours ago

asked 19 hours ago

athin

7,86522474

2

$begingroup$
sounds like a german tank problem.
$endgroup$
– Oray
19 hours ago

2

$begingroup$
@Oray It's more like capture-recapture.
$endgroup$
– Jaap Scherphuis
18 hours ago

$begingroup$
@JaapScherphuis idd! good catch :)
$endgroup$
– Oray
18 hours ago

1

$begingroup$
@hexomino It's the expectation value of $N$.
$endgroup$
– athin
18 hours ago

1

$begingroup$
Related question (to the title).
$endgroup$
– Rand al'Thor
12 hours ago

|
show 3 more comments

I love cats! And I love browsing cat pictures! :3

I stumbled on a webpage that shows random cat pictures every time I refresh the page. (For example, this site.) It has a database of $N$ cats and will pick any random picture in probability of $frac{1}{N}$.

I really enjoyed it. Up to the first $10$ pictures, they are all different cute pictures. However, the $11$-th picture showed the same picture as the $10$-th!

Is the database that small?!

So, what is the expected value of the number of pictures on the database (i.e. $N$)?

edited 15 hours ago

asked 19 hours ago

athin

7,86522474

I love cats! And I love browsing cat pictures! :3

I stumbled on a webpage that shows random cat pictures every time I refresh the page. (For example, this site.) It has a database of $N$ cats and will pick any random picture in probability of $frac{1}{N}$.

I really enjoyed it. Up to the first $10$ pictures, they are all different cute pictures. However, the $11$-th picture showed the same picture as the $10$-th!

Is the database that small?!

So, what is the expected value of the number of pictures on the database (i.e. $N$)?

probability

edited 15 hours ago

asked 19 hours ago

athin

7,86522474

edited 15 hours ago

asked 19 hours ago

athin

7,86522474

edited 15 hours ago

asked 19 hours ago

athin

7,86522474

asked 19 hours ago

athin

7,86522474

asked 19 hours ago

athin

7,86522474

2

$begingroup$
sounds like a german tank problem.
$endgroup$
– Oray
19 hours ago

2

$begingroup$
@Oray It's more like capture-recapture.
$endgroup$
– Jaap Scherphuis
18 hours ago

$begingroup$
@JaapScherphuis idd! good catch :)
$endgroup$
– Oray
18 hours ago

1

$begingroup$
@hexomino It's the expectation value of $N$.
$endgroup$
– athin
18 hours ago

1

$begingroup$
Related question (to the title).
$endgroup$
– Rand al'Thor
12 hours ago

|
show 3 more comments

2

$begingroup$
sounds like a german tank problem.
$endgroup$
– Oray
19 hours ago

2

$begingroup$
@Oray It's more like capture-recapture.
$endgroup$
– Jaap Scherphuis
18 hours ago

$begingroup$
@JaapScherphuis idd! good catch :)
$endgroup$
– Oray
18 hours ago

1

$begingroup$
@hexomino It's the expectation value of $N$.
$endgroup$
– athin
18 hours ago

1

$begingroup$
Related question (to the title).
$endgroup$
– Rand al'Thor
12 hours ago

sounds like a german tank problem.

– Oray
19 hours ago

@Oray It's more like capture-recapture.

– Jaap Scherphuis
18 hours ago

@JaapScherphuis idd! good catch :)

– Oray
18 hours ago

@hexomino It's the expectation value of $N$.

– athin
18 hours ago

Related question (to the title).

– Rand al'Thor
12 hours ago

|
show 3 more comments

5 Answers
5

active

oldest

votes

Proceeding along the same lines as hexomino, I get a different result. The expectation

is undefined for two reasons. First of all, you only have an expectation if you have a probability distribution, and to get a probability distribution in this scenario you need to start with a "prior" probability distribution in advance of seeing the outcome. Second, if we use the "obvious" improper uniform prior (i.e., assume all N equally likely) then our "posterior" probability distribution is also improper (i.e., you can't make the "probabilities" add up to a finite value).

The place where I (ha!) diverge from hexomino's answer is

where he rewrites $frac{N(N-1)cdots(N-9)}{N^{10}}$ as $frac{(N-1)!10}{(N-8)!N^{10}}$, where I think he's made a sign error -- it should be $frac{(N-1)!10}{(N-10)!N^{10}}$. This means that the $N$th value is, for large $N$, roughly proportional to $1/N$, whose sum diverges.

We can try to deal with this inconvenient situation in three ways. First,

we can give up on computing an expectation and compute something else that does make sense for an improper probability distribution. For instance, the mode ("maximum likelihood estimate"). This turns out to be at $N=51$, though the value at $N=52$ is very, very close to being the same.

Second,

we can adopt a different "general-purpose" prior. For instance, we might use the "logarithmic prior" where the prior "probability" of $N$ is taken to be $1/N$. (For many cases where a variable is known to take positive values only, this is a reasonable approach, though the usual handwavy arguments for it don't really apply here.) In this case, the posterior probability for $N$ is roughly proportional to $1/N^2$, whose sum is finite, so we do get an actual probability distribution -- but, alas, not one that has an expectation, because the sum we'd need to compute for that diverges again. Still, in this case we can stil compute the mode (as above) and also the median. The mode is at $N=29$ and the median is anywhere between $N=77$ and $N=78$. (The tail of this distribution is rather "fat", which is why the median is so much bigger than the mode. The mean, again, is infinite -- it's more sensitive than the median to that fat tail.)

Third,

we can make use of our actual knowledge about cat picture websites and try to come up with a more realistic prior based on that knowledge. There are an awful lot of fairly plausible ways to do this, and unfortunately they will all give different answers. The upside is that if our prior falls off rapidly enough for large $N$ (which it will -- indeed, since presumably only finitely many cat pictures have ever been taken, it is zero for large enough $N$) then everything converges and we can compute expectations to our hearts' content. For instance, suppose we choose a prior that's uniform on $[1,M]$ and zero for $N>M$; then there's a rather ugly explicit-ish formula that for large $M$ is approximately $M/log M$, but that limit is approached very slowly -- e.g., for $M=10^5$ the expectation is about 14000 while the approximation is about 8700. For any of these distributions with $Mge51$ the mode ("maximum a posteriori estimate") will still be 51, because this probability distribution is just a rescaled truncation of the improper distribution we got at the outset. Or suppose we (rather arbitrarily) make the prior distribution for $N$ a Poisson distribution of mean $mu$; then it turns out that for any plausible choice of $mu$ the expectation comes out rather close to $mu$ -- there's more information in the prior than in the likelihoods, so to speak.

The upshot of all this is

that I really don't think there's such a thing as a correct answer to the question. As it stands the answer is undefined for two different reasons; we can fix up both of them by specifying a suitable prior, but it turns out that different (but still fairly reasonable) priors can give extremely different answers; and typically the expectation, median and mode come out quite different, indicating that no single figure is going to serve very well to describe our actual beliefs after seeing what we saw.

Credit where due: hexomino did a lot of this before I did (with, unfortunately, an error, but we all make mistakes); all the actual calculations underlying the discussion above were done for me by Mathematica.

Incidentally,

in the real world my estimate would probably not be any of the above. I would expect a website offering cat pictures to have rather a lot of them, since the internet is made of cats, and therefore I'd guess that the repetition was most likely the result of some quirk in their software that made the images not be truly selected at random. (E.g., maybe they seed a random number generator with something involving the time, in some broken way that means that two requests exactly a minute apart are likely to produce the same output. Or something.) If I actually cared, I would take a lot more samples before trying to do any calculations.

edited 16 hours ago

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

add a comment |

N=1,677

Even with all of the complex mathmatics being done, probability only gives us a way to calculate the answer devoid of other information.

Since you gave an example, but didn't specifically say it WASN'T your example, the most likely hypothesis is that the site you listed as an example was the site you visited.

Random.cat actually lists directly on their website:

Cat Count: 1677

That's not enough cats!

In math based puzzles, you usually have to go so obscure and hypothetical to make them work, however, in this example it isn't purely hypothetical. The highest probability is that the site you listed IS the one you're referring to, and that site lists exactly how many pictures there are.

Regardless of whether it's true or not, it IS the highest probability, devoid of new information.

answered 15 hours ago

AHamilton

2493

1

$begingroup$
Err.. Nah.. I wasn't planning to make this puzzle as a lateral-thinking lol. And random.cat is just plainly one of the example of such website.
$endgroup$
– athin
15 hours ago

1

$begingroup$
Yeah, I figured as much, but that's why I made the answer. Intent aside, based on the actual question as phrased, does promote using all available information to determine the most likely probability. :)
$endgroup$
– AHamilton
13 hours ago

add a comment |

Edit: As pointed out by Gareth McCaughan, there is a mistake in this answer: $(N-8)$ in the numerator of the normalisation constant should be $(N-10)$ in which case the sum diverges, so this approach will not work.

I think the expectation value for the number of pictures in the database is

$E(N) approx 39.32 $

Reasoning

Suppose that the number of cat pictures on the database is $N$ and we wish to calculate the probability of viewing $10$ pictures without repeat before seeing a repeat at the $11$th picture. This is just given by $$ P(text{repeat at } 11 | N) = left( frac{N}{N}right) left( frac{N-1}{N}right)left( frac{N-2}{N}right)ldots left( frac{N-9}{N}right) left( frac{10}{N}right) = frac{(N-1)!10}{(N-8)! N^{10}} $$ Of course, we can reverse the reasoning to make this into a likelihood for there being $N$ pictures in the database given the observations $$ L(N|text{repeat at } 11) = frac{(N-1)!10}{(N-8)! N^{10}}$$ Of course, this is not a probability when considered over all values of $N$. To make it so, we must normalise it. The normalisation constant is given by the sum over all possible values of $N$ (I obtained this answer using Wolfram Alpha) $$ displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{10}} = 10 zeta(3) + 3220zeta(5) + 67690zeta(7) + 130680zeta(9) - frac{2113425823775}{12999674453557248} - frac{28 pi^4}{9} - frac{560 pi^6}{27} - frac{1876 pi^8}{135} - frac{160 pi^{10}}{297} approx 0.00761582$$ We shall denote this normalisation constant by $C$. Then, the expectation value of the number of pictures in the database is given by $$E(N) = frac{1}{C} displaystylesum_{N=1}^{infty} N frac{(N-1)!10}{(N-8)! N^{10}} = frac{1}{C}displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{9}}$$ Again, Wolfram Alpha gives a long expression for the sum but when we divide by the normalisation constant, we find that $$ E(N) approx 39.32 $$

Point of Information

As Gareth McCaughan pointed out in the comments, I am essentially using Bayesian reasoning with a flat prior. This might not be a sensible choice in the real world as I would expect to have some concept about how many pictures should be in such a database (i.e, 100 would be more likely than say 1000000000), so the answer should be taken with a pinch of salt.

edited 16 hours ago

answered 18 hours ago

hexomino

41k3122191

2

$begingroup$
You're implicitly assuming a (n improper) flat prior for N. Of course you have to make some such assumption and this is a perfectly reasonable one.
$endgroup$
– Gareth McCaughan♦
18 hours ago

$begingroup$
@GarethMcCaughan yes, that is a good point. There may be a better choice given one's knowledge about what to expect for the number of pictures in such a database.
$endgroup$
– hexomino
17 hours ago

$begingroup$
You've got N-8 where I think you should have N-10.
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
... and, hmm, I think the resulting thing then can't be normalized -- the relevant sum is infinite. (For large N the Nth term is roughly proportional to 1/N.)
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
@GarethMcCaughan Gah! Thank you. Yes, it seems that the series then diverges so this doesn't really work.
$endgroup$
– hexomino
16 hours ago

|
show 2 more comments

Empirical investigation using MATLAB and a very procrastination-prone graduate student.

For each possible N of cats, let's unleash 10000 users on this website. We will count how many users experiences OP's outcome (10 different cats followed by cat that looks like cat #10)

Nlist = 1:200;

Nusers = 10000;

for Ni = 1:length(Nlist)

  N = Nlist(Ni);

  like_OP = 0;

  for user = 1:Nusers

      cats = randi(N, 11,1);

      if length(unique(cats(1:10))) == 10 & cats(11)==cats(10)

          like_OP = like_OP + 1;

      end

  end

  probN(Ni) = like_OP / Nusers;

end

plot(Nlist,probN)

Now we have a list of probabilities that for Ncats random user will share experience with OP.

This process takes about 10 seconds on my macbook, and linearly scales with either number of N that we test, or number of users. I have suspicion that it's a smooth curve, so let's try to zoom-in a little bit:

The conclusion is that I need faster computer, but the cat range is probably 45-60.

answered 7 hours ago

aaaaaa

1634

$begingroup$
This lines up with the maximum likelihood estimate computed by Gareth McCaughan, $N=51$.
$endgroup$
– hexomino
6 hours ago

$begingroup$
@hexomino yeah and convergence is extremely slow.
$endgroup$
– aaaaaa
6 hours ago

add a comment |

Answer:

~~N = 20~~ EDIT N = 81

Reasoning:

Rather than working out all the fancy distributions like the other answers. I'd like to define an assumption that:
The case of the Nth image repeating a previous image occurred when there was a >=50% chance of it repeating.

This is then easily calculable as for there to be a 50% chance of it repeating, it needs to have surpassed half of the possible values. Therefore, in this case, the value of N is twice as much as had been before the repeat, which is $10 cdot 2 = 20$.

EDIT

As @Bass pointed out, I'm forgetting about the birthday problem here, to generalise for this situation we can use:

Where n(d) = 11, we can backtrack and calculate that d = 81.288, rounded to 81 days before a >50% chance of duplication.

edited 15 hours ago

answered 16 hours ago

AHKieran

5,0971040

3

$begingroup$
Aren't you kind of ignoring the birthday paradox here?
$endgroup$
– Bass
16 hours ago

$begingroup$
@Bass yes i was, edited to allow for that.
$endgroup$
– AHKieran
15 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "559"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fpuzzling.stackexchange.com%2fquestions%2f79787%2fthis-website-needs-more-cat-pictures%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

Proceeding along the same lines as hexomino, I get a different result. The expectation

is undefined for two reasons. First of all, you only have an expectation if you have a probability distribution, and to get a probability distribution in this scenario you need to start with a "prior" probability distribution in advance of seeing the outcome. Second, if we use the "obvious" improper uniform prior (i.e., assume all N equally likely) then our "posterior" probability distribution is also improper (i.e., you can't make the "probabilities" add up to a finite value).

The place where I (ha!) diverge from hexomino's answer is

where he rewrites $frac{N(N-1)cdots(N-9)}{N^{10}}$ as $frac{(N-1)!10}{(N-8)!N^{10}}$, where I think he's made a sign error -- it should be $frac{(N-1)!10}{(N-10)!N^{10}}$. This means that the $N$th value is, for large $N$, roughly proportional to $1/N$, whose sum diverges.

We can try to deal with this inconvenient situation in three ways. First,

we can give up on computing an expectation and compute something else that does make sense for an improper probability distribution. For instance, the mode ("maximum likelihood estimate"). This turns out to be at $N=51$, though the value at $N=52$ is very, very close to being the same.

Second,

we can adopt a different "general-purpose" prior. For instance, we might use the "logarithmic prior" where the prior "probability" of $N$ is taken to be $1/N$. (For many cases where a variable is known to take positive values only, this is a reasonable approach, though the usual handwavy arguments for it don't really apply here.) In this case, the posterior probability for $N$ is roughly proportional to $1/N^2$, whose sum is finite, so we do get an actual probability distribution -- but, alas, not one that has an expectation, because the sum we'd need to compute for that diverges again. Still, in this case we can stil compute the mode (as above) and also the median. The mode is at $N=29$ and the median is anywhere between $N=77$ and $N=78$. (The tail of this distribution is rather "fat", which is why the median is so much bigger than the mode. The mean, again, is infinite -- it's more sensitive than the median to that fat tail.)

Third,

we can make use of our actual knowledge about cat picture websites and try to come up with a more realistic prior based on that knowledge. There are an awful lot of fairly plausible ways to do this, and unfortunately they will all give different answers. The upside is that if our prior falls off rapidly enough for large $N$ (which it will -- indeed, since presumably only finitely many cat pictures have ever been taken, it is zero for large enough $N$) then everything converges and we can compute expectations to our hearts' content. For instance, suppose we choose a prior that's uniform on $[1,M]$ and zero for $N>M$; then there's a rather ugly explicit-ish formula that for large $M$ is approximately $M/log M$, but that limit is approached very slowly -- e.g., for $M=10^5$ the expectation is about 14000 while the approximation is about 8700. For any of these distributions with $Mge51$ the mode ("maximum a posteriori estimate") will still be 51, because this probability distribution is just a rescaled truncation of the improper distribution we got at the outset. Or suppose we (rather arbitrarily) make the prior distribution for $N$ a Poisson distribution of mean $mu$; then it turns out that for any plausible choice of $mu$ the expectation comes out rather close to $mu$ -- there's more information in the prior than in the likelihoods, so to speak.

The upshot of all this is

that I really don't think there's such a thing as a correct answer to the question. As it stands the answer is undefined for two different reasons; we can fix up both of them by specifying a suitable prior, but it turns out that different (but still fairly reasonable) priors can give extremely different answers; and typically the expectation, median and mode come out quite different, indicating that no single figure is going to serve very well to describe our actual beliefs after seeing what we saw.

Incidentally,

in the real world my estimate would probably not be any of the above. I would expect a website offering cat pictures to have rather a lot of them, since the internet is made of cats, and therefore I'd guess that the repetition was most likely the result of some quirk in their software that made the images not be truly selected at random. (E.g., maybe they seed a random number generator with something involving the time, in some broken way that means that two requests exactly a minute apart are likely to produce the same output. Or something.) If I actually cared, I would take a lot more samples before trying to do any calculations.

edited 16 hours ago

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

add a comment |

Proceeding along the same lines as hexomino, I get a different result. The expectation

is undefined for two reasons. First of all, you only have an expectation if you have a probability distribution, and to get a probability distribution in this scenario you need to start with a "prior" probability distribution in advance of seeing the outcome. Second, if we use the "obvious" improper uniform prior (i.e., assume all N equally likely) then our "posterior" probability distribution is also improper (i.e., you can't make the "probabilities" add up to a finite value).

The place where I (ha!) diverge from hexomino's answer is

where he rewrites $frac{N(N-1)cdots(N-9)}{N^{10}}$ as $frac{(N-1)!10}{(N-8)!N^{10}}$, where I think he's made a sign error -- it should be $frac{(N-1)!10}{(N-10)!N^{10}}$. This means that the $N$th value is, for large $N$, roughly proportional to $1/N$, whose sum diverges.

We can try to deal with this inconvenient situation in three ways. First,

we can give up on computing an expectation and compute something else that does make sense for an improper probability distribution. For instance, the mode ("maximum likelihood estimate"). This turns out to be at $N=51$, though the value at $N=52$ is very, very close to being the same.

Second,

we can adopt a different "general-purpose" prior. For instance, we might use the "logarithmic prior" where the prior "probability" of $N$ is taken to be $1/N$. (For many cases where a variable is known to take positive values only, this is a reasonable approach, though the usual handwavy arguments for it don't really apply here.) In this case, the posterior probability for $N$ is roughly proportional to $1/N^2$, whose sum is finite, so we do get an actual probability distribution -- but, alas, not one that has an expectation, because the sum we'd need to compute for that diverges again. Still, in this case we can stil compute the mode (as above) and also the median. The mode is at $N=29$ and the median is anywhere between $N=77$ and $N=78$. (The tail of this distribution is rather "fat", which is why the median is so much bigger than the mode. The mean, again, is infinite -- it's more sensitive than the median to that fat tail.)

Third,

we can make use of our actual knowledge about cat picture websites and try to come up with a more realistic prior based on that knowledge. There are an awful lot of fairly plausible ways to do this, and unfortunately they will all give different answers. The upside is that if our prior falls off rapidly enough for large $N$ (which it will -- indeed, since presumably only finitely many cat pictures have ever been taken, it is zero for large enough $N$) then everything converges and we can compute expectations to our hearts' content. For instance, suppose we choose a prior that's uniform on $[1,M]$ and zero for $N>M$; then there's a rather ugly explicit-ish formula that for large $M$ is approximately $M/log M$, but that limit is approached very slowly -- e.g., for $M=10^5$ the expectation is about 14000 while the approximation is about 8700. For any of these distributions with $Mge51$ the mode ("maximum a posteriori estimate") will still be 51, because this probability distribution is just a rescaled truncation of the improper distribution we got at the outset. Or suppose we (rather arbitrarily) make the prior distribution for $N$ a Poisson distribution of mean $mu$; then it turns out that for any plausible choice of $mu$ the expectation comes out rather close to $mu$ -- there's more information in the prior than in the likelihoods, so to speak.

The upshot of all this is

that I really don't think there's such a thing as a correct answer to the question. As it stands the answer is undefined for two different reasons; we can fix up both of them by specifying a suitable prior, but it turns out that different (but still fairly reasonable) priors can give extremely different answers; and typically the expectation, median and mode come out quite different, indicating that no single figure is going to serve very well to describe our actual beliefs after seeing what we saw.

Incidentally,

in the real world my estimate would probably not be any of the above. I would expect a website offering cat pictures to have rather a lot of them, since the internet is made of cats, and therefore I'd guess that the repetition was most likely the result of some quirk in their software that made the images not be truly selected at random. (E.g., maybe they seed a random number generator with something involving the time, in some broken way that means that two requests exactly a minute apart are likely to produce the same output. Or something.) If I actually cared, I would take a lot more samples before trying to do any calculations.

edited 16 hours ago

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

add a comment |

Proceeding along the same lines as hexomino, I get a different result. The expectation

is undefined for two reasons. First of all, you only have an expectation if you have a probability distribution, and to get a probability distribution in this scenario you need to start with a "prior" probability distribution in advance of seeing the outcome. Second, if we use the "obvious" improper uniform prior (i.e., assume all N equally likely) then our "posterior" probability distribution is also improper (i.e., you can't make the "probabilities" add up to a finite value).

The place where I (ha!) diverge from hexomino's answer is

where he rewrites $frac{N(N-1)cdots(N-9)}{N^{10}}$ as $frac{(N-1)!10}{(N-8)!N^{10}}$, where I think he's made a sign error -- it should be $frac{(N-1)!10}{(N-10)!N^{10}}$. This means that the $N$th value is, for large $N$, roughly proportional to $1/N$, whose sum diverges.

We can try to deal with this inconvenient situation in three ways. First,

we can give up on computing an expectation and compute something else that does make sense for an improper probability distribution. For instance, the mode ("maximum likelihood estimate"). This turns out to be at $N=51$, though the value at $N=52$ is very, very close to being the same.

Second,

we can adopt a different "general-purpose" prior. For instance, we might use the "logarithmic prior" where the prior "probability" of $N$ is taken to be $1/N$. (For many cases where a variable is known to take positive values only, this is a reasonable approach, though the usual handwavy arguments for it don't really apply here.) In this case, the posterior probability for $N$ is roughly proportional to $1/N^2$, whose sum is finite, so we do get an actual probability distribution -- but, alas, not one that has an expectation, because the sum we'd need to compute for that diverges again. Still, in this case we can stil compute the mode (as above) and also the median. The mode is at $N=29$ and the median is anywhere between $N=77$ and $N=78$. (The tail of this distribution is rather "fat", which is why the median is so much bigger than the mode. The mean, again, is infinite -- it's more sensitive than the median to that fat tail.)

Third,

we can make use of our actual knowledge about cat picture websites and try to come up with a more realistic prior based on that knowledge. There are an awful lot of fairly plausible ways to do this, and unfortunately they will all give different answers. The upside is that if our prior falls off rapidly enough for large $N$ (which it will -- indeed, since presumably only finitely many cat pictures have ever been taken, it is zero for large enough $N$) then everything converges and we can compute expectations to our hearts' content. For instance, suppose we choose a prior that's uniform on $[1,M]$ and zero for $N>M$; then there's a rather ugly explicit-ish formula that for large $M$ is approximately $M/log M$, but that limit is approached very slowly -- e.g., for $M=10^5$ the expectation is about 14000 while the approximation is about 8700. For any of these distributions with $Mge51$ the mode ("maximum a posteriori estimate") will still be 51, because this probability distribution is just a rescaled truncation of the improper distribution we got at the outset. Or suppose we (rather arbitrarily) make the prior distribution for $N$ a Poisson distribution of mean $mu$; then it turns out that for any plausible choice of $mu$ the expectation comes out rather close to $mu$ -- there's more information in the prior than in the likelihoods, so to speak.

The upshot of all this is

that I really don't think there's such a thing as a correct answer to the question. As it stands the answer is undefined for two different reasons; we can fix up both of them by specifying a suitable prior, but it turns out that different (but still fairly reasonable) priors can give extremely different answers; and typically the expectation, median and mode come out quite different, indicating that no single figure is going to serve very well to describe our actual beliefs after seeing what we saw.

Incidentally,

in the real world my estimate would probably not be any of the above. I would expect a website offering cat pictures to have rather a lot of them, since the internet is made of cats, and therefore I'd guess that the repetition was most likely the result of some quirk in their software that made the images not be truly selected at random. (E.g., maybe they seed a random number generator with something involving the time, in some broken way that means that two requests exactly a minute apart are likely to produce the same output. Or something.) If I actually cared, I would take a lot more samples before trying to do any calculations.

edited 16 hours ago

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

Proceeding along the same lines as hexomino, I get a different result. The expectation

is undefined for two reasons. First of all, you only have an expectation if you have a probability distribution, and to get a probability distribution in this scenario you need to start with a "prior" probability distribution in advance of seeing the outcome. Second, if we use the "obvious" improper uniform prior (i.e., assume all N equally likely) then our "posterior" probability distribution is also improper (i.e., you can't make the "probabilities" add up to a finite value).

The place where I (ha!) diverge from hexomino's answer is

where he rewrites $frac{N(N-1)cdots(N-9)}{N^{10}}$ as $frac{(N-1)!10}{(N-8)!N^{10}}$, where I think he's made a sign error -- it should be $frac{(N-1)!10}{(N-10)!N^{10}}$. This means that the $N$th value is, for large $N$, roughly proportional to $1/N$, whose sum diverges.

We can try to deal with this inconvenient situation in three ways. First,

we can give up on computing an expectation and compute something else that does make sense for an improper probability distribution. For instance, the mode ("maximum likelihood estimate"). This turns out to be at $N=51$, though the value at $N=52$ is very, very close to being the same.

Second,

we can adopt a different "general-purpose" prior. For instance, we might use the "logarithmic prior" where the prior "probability" of $N$ is taken to be $1/N$. (For many cases where a variable is known to take positive values only, this is a reasonable approach, though the usual handwavy arguments for it don't really apply here.) In this case, the posterior probability for $N$ is roughly proportional to $1/N^2$, whose sum is finite, so we do get an actual probability distribution -- but, alas, not one that has an expectation, because the sum we'd need to compute for that diverges again. Still, in this case we can stil compute the mode (as above) and also the median. The mode is at $N=29$ and the median is anywhere between $N=77$ and $N=78$. (The tail of this distribution is rather "fat", which is why the median is so much bigger than the mode. The mean, again, is infinite -- it's more sensitive than the median to that fat tail.)

Third,

we can make use of our actual knowledge about cat picture websites and try to come up with a more realistic prior based on that knowledge. There are an awful lot of fairly plausible ways to do this, and unfortunately they will all give different answers. The upside is that if our prior falls off rapidly enough for large $N$ (which it will -- indeed, since presumably only finitely many cat pictures have ever been taken, it is zero for large enough $N$) then everything converges and we can compute expectations to our hearts' content. For instance, suppose we choose a prior that's uniform on $[1,M]$ and zero for $N>M$; then there's a rather ugly explicit-ish formula that for large $M$ is approximately $M/log M$, but that limit is approached very slowly -- e.g., for $M=10^5$ the expectation is about 14000 while the approximation is about 8700. For any of these distributions with $Mge51$ the mode ("maximum a posteriori estimate") will still be 51, because this probability distribution is just a rescaled truncation of the improper distribution we got at the outset. Or suppose we (rather arbitrarily) make the prior distribution for $N$ a Poisson distribution of mean $mu$; then it turns out that for any plausible choice of $mu$ the expectation comes out rather close to $mu$ -- there's more information in the prior than in the likelihoods, so to speak.

The upshot of all this is

that I really don't think there's such a thing as a correct answer to the question. As it stands the answer is undefined for two different reasons; we can fix up both of them by specifying a suitable prior, but it turns out that different (but still fairly reasonable) priors can give extremely different answers; and typically the expectation, median and mode come out quite different, indicating that no single figure is going to serve very well to describe our actual beliefs after seeing what we saw.

Incidentally,

in the real world my estimate would probably not be any of the above. I would expect a website offering cat pictures to have rather a lot of them, since the internet is made of cats, and therefore I'd guess that the repetition was most likely the result of some quirk in their software that made the images not be truly selected at random. (E.g., maybe they seed a random number generator with something involving the time, in some broken way that means that two requests exactly a minute apart are likely to produce the same output. Or something.) If I actually cared, I would take a lot more samples before trying to do any calculations.

edited 16 hours ago

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

edited 16 hours ago

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

answered 16 hours ago

Gareth McCaughan♦

62.8k3162245

add a comment |

N=1,677

Even with all of the complex mathmatics being done, probability only gives us a way to calculate the answer devoid of other information.

Since you gave an example, but didn't specifically say it WASN'T your example, the most likely hypothesis is that the site you listed as an example was the site you visited.

Random.cat actually lists directly on their website:

Cat Count: 1677

That's not enough cats!

Regardless of whether it's true or not, it IS the highest probability, devoid of new information.

answered 15 hours ago

AHamilton

2493

1

$begingroup$
Err.. Nah.. I wasn't planning to make this puzzle as a lateral-thinking lol. And random.cat is just plainly one of the example of such website.
$endgroup$
– athin
15 hours ago

1

$begingroup$
Yeah, I figured as much, but that's why I made the answer. Intent aside, based on the actual question as phrased, does promote using all available information to determine the most likely probability. :)
$endgroup$
– AHamilton
13 hours ago

add a comment |

N=1,677

Even with all of the complex mathmatics being done, probability only gives us a way to calculate the answer devoid of other information.

Since you gave an example, but didn't specifically say it WASN'T your example, the most likely hypothesis is that the site you listed as an example was the site you visited.

Random.cat actually lists directly on their website:

Cat Count: 1677

That's not enough cats!

Regardless of whether it's true or not, it IS the highest probability, devoid of new information.

answered 15 hours ago

AHamilton

2493

1

$begingroup$
Err.. Nah.. I wasn't planning to make this puzzle as a lateral-thinking lol. And random.cat is just plainly one of the example of such website.
$endgroup$
– athin
15 hours ago

1

$begingroup$
Yeah, I figured as much, but that's why I made the answer. Intent aside, based on the actual question as phrased, does promote using all available information to determine the most likely probability. :)
$endgroup$
– AHamilton
13 hours ago

add a comment |

N=1,677

Even with all of the complex mathmatics being done, probability only gives us a way to calculate the answer devoid of other information.

Since you gave an example, but didn't specifically say it WASN'T your example, the most likely hypothesis is that the site you listed as an example was the site you visited.

Random.cat actually lists directly on their website:

Cat Count: 1677

That's not enough cats!

Regardless of whether it's true or not, it IS the highest probability, devoid of new information.

answered 15 hours ago

AHamilton

2493

N=1,677

Even with all of the complex mathmatics being done, probability only gives us a way to calculate the answer devoid of other information.

Since you gave an example, but didn't specifically say it WASN'T your example, the most likely hypothesis is that the site you listed as an example was the site you visited.

Random.cat actually lists directly on their website:

Cat Count: 1677

That's not enough cats!

Regardless of whether it's true or not, it IS the highest probability, devoid of new information.

answered 15 hours ago

AHamilton

2493

answered 15 hours ago

AHamilton

2493

answered 15 hours ago

AHamilton

2493

answered 15 hours ago

AHamilton

2493

1

$begingroup$
Err.. Nah.. I wasn't planning to make this puzzle as a lateral-thinking lol. And random.cat is just plainly one of the example of such website.
$endgroup$
– athin
15 hours ago

1

$begingroup$
Yeah, I figured as much, but that's why I made the answer. Intent aside, based on the actual question as phrased, does promote using all available information to determine the most likely probability. :)
$endgroup$
– AHamilton
13 hours ago

add a comment |

1

$begingroup$
Err.. Nah.. I wasn't planning to make this puzzle as a lateral-thinking lol. And random.cat is just plainly one of the example of such website.
$endgroup$
– athin
15 hours ago

1

$begingroup$
Yeah, I figured as much, but that's why I made the answer. Intent aside, based on the actual question as phrased, does promote using all available information to determine the most likely probability. :)
$endgroup$
– AHamilton
13 hours ago

Err.. Nah.. I wasn't planning to make this puzzle as a lateral-thinking lol. And random.cat is just plainly one of the example of such website.

– athin
15 hours ago

Yeah, I figured as much, but that's why I made the answer. Intent aside, based on the actual question as phrased, does promote using all available information to determine the most likely probability. :)

– AHamilton
13 hours ago

add a comment |

I think the expectation value for the number of pictures in the database is

$E(N) approx 39.32 $

Reasoning

Suppose that the number of cat pictures on the database is $N$ and we wish to calculate the probability of viewing $10$ pictures without repeat before seeing a repeat at the $11$th picture. This is just given by $$ P(text{repeat at } 11 | N) = left( frac{N}{N}right) left( frac{N-1}{N}right)left( frac{N-2}{N}right)ldots left( frac{N-9}{N}right) left( frac{10}{N}right) = frac{(N-1)!10}{(N-8)! N^{10}} $$ Of course, we can reverse the reasoning to make this into a likelihood for there being $N$ pictures in the database given the observations $$ L(N|text{repeat at } 11) = frac{(N-1)!10}{(N-8)! N^{10}}$$ Of course, this is not a probability when considered over all values of $N$. To make it so, we must normalise it. The normalisation constant is given by the sum over all possible values of $N$ (I obtained this answer using Wolfram Alpha) $$ displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{10}} = 10 zeta(3) + 3220zeta(5) + 67690zeta(7) + 130680zeta(9) - frac{2113425823775}{12999674453557248} - frac{28 pi^4}{9} - frac{560 pi^6}{27} - frac{1876 pi^8}{135} - frac{160 pi^{10}}{297} approx 0.00761582$$ We shall denote this normalisation constant by $C$. Then, the expectation value of the number of pictures in the database is given by $$E(N) = frac{1}{C} displaystylesum_{N=1}^{infty} N frac{(N-1)!10}{(N-8)! N^{10}} = frac{1}{C}displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{9}}$$ Again, Wolfram Alpha gives a long expression for the sum but when we divide by the normalisation constant, we find that $$ E(N) approx 39.32 $$

Point of Information

As Gareth McCaughan pointed out in the comments, I am essentially using Bayesian reasoning with a flat prior. This might not be a sensible choice in the real world as I would expect to have some concept about how many pictures should be in such a database (i.e, 100 would be more likely than say 1000000000), so the answer should be taken with a pinch of salt.

edited 16 hours ago

answered 18 hours ago

hexomino

41k3122191

2

$begingroup$
You're implicitly assuming a (n improper) flat prior for N. Of course you have to make some such assumption and this is a perfectly reasonable one.
$endgroup$
– Gareth McCaughan♦
18 hours ago

$begingroup$
@GarethMcCaughan yes, that is a good point. There may be a better choice given one's knowledge about what to expect for the number of pictures in such a database.
$endgroup$
– hexomino
17 hours ago

$begingroup$
You've got N-8 where I think you should have N-10.
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
... and, hmm, I think the resulting thing then can't be normalized -- the relevant sum is infinite. (For large N the Nth term is roughly proportional to 1/N.)
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
@GarethMcCaughan Gah! Thank you. Yes, it seems that the series then diverges so this doesn't really work.
$endgroup$
– hexomino
16 hours ago

|
show 2 more comments

I think the expectation value for the number of pictures in the database is

$E(N) approx 39.32 $

Reasoning

Suppose that the number of cat pictures on the database is $N$ and we wish to calculate the probability of viewing $10$ pictures without repeat before seeing a repeat at the $11$th picture. This is just given by $$ P(text{repeat at } 11 | N) = left( frac{N}{N}right) left( frac{N-1}{N}right)left( frac{N-2}{N}right)ldots left( frac{N-9}{N}right) left( frac{10}{N}right) = frac{(N-1)!10}{(N-8)! N^{10}} $$ Of course, we can reverse the reasoning to make this into a likelihood for there being $N$ pictures in the database given the observations $$ L(N|text{repeat at } 11) = frac{(N-1)!10}{(N-8)! N^{10}}$$ Of course, this is not a probability when considered over all values of $N$. To make it so, we must normalise it. The normalisation constant is given by the sum over all possible values of $N$ (I obtained this answer using Wolfram Alpha) $$ displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{10}} = 10 zeta(3) + 3220zeta(5) + 67690zeta(7) + 130680zeta(9) - frac{2113425823775}{12999674453557248} - frac{28 pi^4}{9} - frac{560 pi^6}{27} - frac{1876 pi^8}{135} - frac{160 pi^{10}}{297} approx 0.00761582$$ We shall denote this normalisation constant by $C$. Then, the expectation value of the number of pictures in the database is given by $$E(N) = frac{1}{C} displaystylesum_{N=1}^{infty} N frac{(N-1)!10}{(N-8)! N^{10}} = frac{1}{C}displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{9}}$$ Again, Wolfram Alpha gives a long expression for the sum but when we divide by the normalisation constant, we find that $$ E(N) approx 39.32 $$

Point of Information

As Gareth McCaughan pointed out in the comments, I am essentially using Bayesian reasoning with a flat prior. This might not be a sensible choice in the real world as I would expect to have some concept about how many pictures should be in such a database (i.e, 100 would be more likely than say 1000000000), so the answer should be taken with a pinch of salt.

edited 16 hours ago

answered 18 hours ago

hexomino

41k3122191

2

$begingroup$
You're implicitly assuming a (n improper) flat prior for N. Of course you have to make some such assumption and this is a perfectly reasonable one.
$endgroup$
– Gareth McCaughan♦
18 hours ago

$begingroup$
@GarethMcCaughan yes, that is a good point. There may be a better choice given one's knowledge about what to expect for the number of pictures in such a database.
$endgroup$
– hexomino
17 hours ago

$begingroup$
You've got N-8 where I think you should have N-10.
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
... and, hmm, I think the resulting thing then can't be normalized -- the relevant sum is infinite. (For large N the Nth term is roughly proportional to 1/N.)
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
@GarethMcCaughan Gah! Thank you. Yes, it seems that the series then diverges so this doesn't really work.
$endgroup$
– hexomino
16 hours ago

|
show 2 more comments

I think the expectation value for the number of pictures in the database is

$E(N) approx 39.32 $

Reasoning

Suppose that the number of cat pictures on the database is $N$ and we wish to calculate the probability of viewing $10$ pictures without repeat before seeing a repeat at the $11$th picture. This is just given by $$ P(text{repeat at } 11 | N) = left( frac{N}{N}right) left( frac{N-1}{N}right)left( frac{N-2}{N}right)ldots left( frac{N-9}{N}right) left( frac{10}{N}right) = frac{(N-1)!10}{(N-8)! N^{10}} $$ Of course, we can reverse the reasoning to make this into a likelihood for there being $N$ pictures in the database given the observations $$ L(N|text{repeat at } 11) = frac{(N-1)!10}{(N-8)! N^{10}}$$ Of course, this is not a probability when considered over all values of $N$. To make it so, we must normalise it. The normalisation constant is given by the sum over all possible values of $N$ (I obtained this answer using Wolfram Alpha) $$ displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{10}} = 10 zeta(3) + 3220zeta(5) + 67690zeta(7) + 130680zeta(9) - frac{2113425823775}{12999674453557248} - frac{28 pi^4}{9} - frac{560 pi^6}{27} - frac{1876 pi^8}{135} - frac{160 pi^{10}}{297} approx 0.00761582$$ We shall denote this normalisation constant by $C$. Then, the expectation value of the number of pictures in the database is given by $$E(N) = frac{1}{C} displaystylesum_{N=1}^{infty} N frac{(N-1)!10}{(N-8)! N^{10}} = frac{1}{C}displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{9}}$$ Again, Wolfram Alpha gives a long expression for the sum but when we divide by the normalisation constant, we find that $$ E(N) approx 39.32 $$

Point of Information

As Gareth McCaughan pointed out in the comments, I am essentially using Bayesian reasoning with a flat prior. This might not be a sensible choice in the real world as I would expect to have some concept about how many pictures should be in such a database (i.e, 100 would be more likely than say 1000000000), so the answer should be taken with a pinch of salt.

edited 16 hours ago

answered 18 hours ago

hexomino

41k3122191

I think the expectation value for the number of pictures in the database is

$E(N) approx 39.32 $

Reasoning

Suppose that the number of cat pictures on the database is $N$ and we wish to calculate the probability of viewing $10$ pictures without repeat before seeing a repeat at the $11$th picture. This is just given by $$ P(text{repeat at } 11 | N) = left( frac{N}{N}right) left( frac{N-1}{N}right)left( frac{N-2}{N}right)ldots left( frac{N-9}{N}right) left( frac{10}{N}right) = frac{(N-1)!10}{(N-8)! N^{10}} $$ Of course, we can reverse the reasoning to make this into a likelihood for there being $N$ pictures in the database given the observations $$ L(N|text{repeat at } 11) = frac{(N-1)!10}{(N-8)! N^{10}}$$ Of course, this is not a probability when considered over all values of $N$. To make it so, we must normalise it. The normalisation constant is given by the sum over all possible values of $N$ (I obtained this answer using Wolfram Alpha) $$ displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{10}} = 10 zeta(3) + 3220zeta(5) + 67690zeta(7) + 130680zeta(9) - frac{2113425823775}{12999674453557248} - frac{28 pi^4}{9} - frac{560 pi^6}{27} - frac{1876 pi^8}{135} - frac{160 pi^{10}}{297} approx 0.00761582$$ We shall denote this normalisation constant by $C$. Then, the expectation value of the number of pictures in the database is given by $$E(N) = frac{1}{C} displaystylesum_{N=1}^{infty} N frac{(N-1)!10}{(N-8)! N^{10}} = frac{1}{C}displaystylesum_{N=1}^{infty} frac{(N-1)!10}{(N-8)! N^{9}}$$ Again, Wolfram Alpha gives a long expression for the sum but when we divide by the normalisation constant, we find that $$ E(N) approx 39.32 $$

Point of Information

As Gareth McCaughan pointed out in the comments, I am essentially using Bayesian reasoning with a flat prior. This might not be a sensible choice in the real world as I would expect to have some concept about how many pictures should be in such a database (i.e, 100 would be more likely than say 1000000000), so the answer should be taken with a pinch of salt.

edited 16 hours ago

answered 18 hours ago

hexomino

41k3122191

edited 16 hours ago

answered 18 hours ago

hexomino

41k3122191

answered 18 hours ago

hexomino

41k3122191

answered 18 hours ago

hexomino

41k3122191

2

$begingroup$
You're implicitly assuming a (n improper) flat prior for N. Of course you have to make some such assumption and this is a perfectly reasonable one.
$endgroup$
– Gareth McCaughan♦
18 hours ago

$begingroup$
@GarethMcCaughan yes, that is a good point. There may be a better choice given one's knowledge about what to expect for the number of pictures in such a database.
$endgroup$
– hexomino
17 hours ago

$begingroup$
You've got N-8 where I think you should have N-10.
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
... and, hmm, I think the resulting thing then can't be normalized -- the relevant sum is infinite. (For large N the Nth term is roughly proportional to 1/N.)
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
@GarethMcCaughan Gah! Thank you. Yes, it seems that the series then diverges so this doesn't really work.
$endgroup$
– hexomino
16 hours ago

|
show 2 more comments

2

$begingroup$
You're implicitly assuming a (n improper) flat prior for N. Of course you have to make some such assumption and this is a perfectly reasonable one.
$endgroup$
– Gareth McCaughan♦
18 hours ago

$begingroup$
@GarethMcCaughan yes, that is a good point. There may be a better choice given one's knowledge about what to expect for the number of pictures in such a database.
$endgroup$
– hexomino
17 hours ago

$begingroup$
You've got N-8 where I think you should have N-10.
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
... and, hmm, I think the resulting thing then can't be normalized -- the relevant sum is infinite. (For large N the Nth term is roughly proportional to 1/N.)
$endgroup$
– Gareth McCaughan♦
17 hours ago

$begingroup$
@GarethMcCaughan Gah! Thank you. Yes, it seems that the series then diverges so this doesn't really work.
$endgroup$
– hexomino
16 hours ago

You're implicitly assuming a (n improper) flat prior for N. Of course you have to make some such assumption and this is a perfectly reasonable one.

– Gareth McCaughan♦
18 hours ago

@GarethMcCaughan yes, that is a good point. There may be a better choice given one's knowledge about what to expect for the number of pictures in such a database.

– hexomino
17 hours ago

You've got N-8 where I think you should have N-10.

– Gareth McCaughan♦
17 hours ago

... and, hmm, I think the resulting thing then can't be normalized -- the relevant sum is infinite. (For large N the Nth term is roughly proportional to 1/N.)

– Gareth McCaughan♦
17 hours ago

@GarethMcCaughan Gah! Thank you. Yes, it seems that the series then diverges so this doesn't really work.

– hexomino
16 hours ago

|
show 2 more comments

Empirical investigation using MATLAB and a very procrastination-prone graduate student.

For each possible N of cats, let's unleash 10000 users on this website. We will count how many users experiences OP's outcome (10 different cats followed by cat that looks like cat #10)

Nlist = 1:200;

Nusers = 10000;

for Ni = 1:length(Nlist)

  N = Nlist(Ni);

  like_OP = 0;

  for user = 1:Nusers

      cats = randi(N, 11,1);

      if length(unique(cats(1:10))) == 10 & cats(11)==cats(10)

          like_OP = like_OP + 1;

      end

  end

  probN(Ni) = like_OP / Nusers;

end

plot(Nlist,probN)

Now we have a list of probabilities that for Ncats random user will share experience with OP.

The conclusion is that I need faster computer, but the cat range is probably 45-60.

answered 7 hours ago

aaaaaa

1634

$begingroup$
This lines up with the maximum likelihood estimate computed by Gareth McCaughan, $N=51$.
$endgroup$
– hexomino
6 hours ago

$begingroup$
@hexomino yeah and convergence is extremely slow.
$endgroup$
– aaaaaa
6 hours ago

add a comment |

Empirical investigation using MATLAB and a very procrastination-prone graduate student.

For each possible N of cats, let's unleash 10000 users on this website. We will count how many users experiences OP's outcome (10 different cats followed by cat that looks like cat #10)

Nlist = 1:200;

Nusers = 10000;

for Ni = 1:length(Nlist)

  N = Nlist(Ni);

  like_OP = 0;

  for user = 1:Nusers

      cats = randi(N, 11,1);

      if length(unique(cats(1:10))) == 10 & cats(11)==cats(10)

          like_OP = like_OP + 1;

      end

  end

  probN(Ni) = like_OP / Nusers;

end

plot(Nlist,probN)

Now we have a list of probabilities that for Ncats random user will share experience with OP.

The conclusion is that I need faster computer, but the cat range is probably 45-60.

answered 7 hours ago

aaaaaa

1634

$begingroup$
This lines up with the maximum likelihood estimate computed by Gareth McCaughan, $N=51$.
$endgroup$
– hexomino
6 hours ago

$begingroup$
@hexomino yeah and convergence is extremely slow.
$endgroup$
– aaaaaa
6 hours ago

add a comment |

Empirical investigation using MATLAB and a very procrastination-prone graduate student.

For each possible N of cats, let's unleash 10000 users on this website. We will count how many users experiences OP's outcome (10 different cats followed by cat that looks like cat #10)

Nlist = 1:200;

Nusers = 10000;

for Ni = 1:length(Nlist)

  N = Nlist(Ni);

  like_OP = 0;

  for user = 1:Nusers

      cats = randi(N, 11,1);

      if length(unique(cats(1:10))) == 10 & cats(11)==cats(10)

          like_OP = like_OP + 1;

      end

  end

  probN(Ni) = like_OP / Nusers;

end

plot(Nlist,probN)

Now we have a list of probabilities that for Ncats random user will share experience with OP.

The conclusion is that I need faster computer, but the cat range is probably 45-60.

answered 7 hours ago

aaaaaa

1634

Empirical investigation using MATLAB and a very procrastination-prone graduate student.

For each possible N of cats, let's unleash 10000 users on this website. We will count how many users experiences OP's outcome (10 different cats followed by cat that looks like cat #10)

Nlist = 1:200;

Nusers = 10000;

for Ni = 1:length(Nlist)

  N = Nlist(Ni);

  like_OP = 0;

  for user = 1:Nusers

      cats = randi(N, 11,1);

      if length(unique(cats(1:10))) == 10 & cats(11)==cats(10)

          like_OP = like_OP + 1;

      end

  end

  probN(Ni) = like_OP / Nusers;

end

plot(Nlist,probN)

Now we have a list of probabilities that for Ncats random user will share experience with OP.

The conclusion is that I need faster computer, but the cat range is probably 45-60.

answered 7 hours ago

aaaaaa

1634

answered 7 hours ago

aaaaaa

1634

answered 7 hours ago

aaaaaa

1634

answered 7 hours ago

aaaaaa

1634

$begingroup$
This lines up with the maximum likelihood estimate computed by Gareth McCaughan, $N=51$.
$endgroup$
– hexomino
6 hours ago

$begingroup$
@hexomino yeah and convergence is extremely slow.
$endgroup$
– aaaaaa
6 hours ago

add a comment |

$begingroup$
This lines up with the maximum likelihood estimate computed by Gareth McCaughan, $N=51$.
$endgroup$
– hexomino
6 hours ago

$begingroup$
@hexomino yeah and convergence is extremely slow.
$endgroup$
– aaaaaa
6 hours ago

This lines up with the maximum likelihood estimate computed by Gareth McCaughan, $N=51$.

– hexomino
6 hours ago

@hexomino yeah and convergence is extremely slow.

– aaaaaa
6 hours ago

add a comment |

Answer:

~~N = 20~~ EDIT N = 81

Reasoning:

Rather than working out all the fancy distributions like the other answers. I'd like to define an assumption that:
The case of the Nth image repeating a previous image occurred when there was a >=50% chance of it repeating.

This is then easily calculable as for there to be a 50% chance of it repeating, it needs to have surpassed half of the possible values. Therefore, in this case, the value of N is twice as much as had been before the repeat, which is $10 cdot 2 = 20$.

EDIT

As @Bass pointed out, I'm forgetting about the birthday problem here, to generalise for this situation we can use:

Where n(d) = 11, we can backtrack and calculate that d = 81.288, rounded to 81 days before a >50% chance of duplication.

edited 15 hours ago

answered 16 hours ago

AHKieran

5,0971040

3

$begingroup$
Aren't you kind of ignoring the birthday paradox here?
$endgroup$
– Bass
16 hours ago

$begingroup$
@Bass yes i was, edited to allow for that.
$endgroup$
– AHKieran
15 hours ago

add a comment |

Answer:

~~N = 20~~ EDIT N = 81

Reasoning:

Rather than working out all the fancy distributions like the other answers. I'd like to define an assumption that:
The case of the Nth image repeating a previous image occurred when there was a >=50% chance of it repeating.

This is then easily calculable as for there to be a 50% chance of it repeating, it needs to have surpassed half of the possible values. Therefore, in this case, the value of N is twice as much as had been before the repeat, which is $10 cdot 2 = 20$.

EDIT

As @Bass pointed out, I'm forgetting about the birthday problem here, to generalise for this situation we can use:

Where n(d) = 11, we can backtrack and calculate that d = 81.288, rounded to 81 days before a >50% chance of duplication.

edited 15 hours ago

answered 16 hours ago

AHKieran

5,0971040

3

$begingroup$
Aren't you kind of ignoring the birthday paradox here?
$endgroup$
– Bass
16 hours ago

$begingroup$
@Bass yes i was, edited to allow for that.
$endgroup$
– AHKieran
15 hours ago

add a comment |

Answer:

~~N = 20~~ EDIT N = 81

Reasoning:

Rather than working out all the fancy distributions like the other answers. I'd like to define an assumption that:
The case of the Nth image repeating a previous image occurred when there was a >=50% chance of it repeating.

This is then easily calculable as for there to be a 50% chance of it repeating, it needs to have surpassed half of the possible values. Therefore, in this case, the value of N is twice as much as had been before the repeat, which is $10 cdot 2 = 20$.

EDIT

As @Bass pointed out, I'm forgetting about the birthday problem here, to generalise for this situation we can use:

Where n(d) = 11, we can backtrack and calculate that d = 81.288, rounded to 81 days before a >50% chance of duplication.

edited 15 hours ago

answered 16 hours ago

AHKieran

5,0971040

Answer:

~~N = 20~~ EDIT N = 81

Reasoning:

Rather than working out all the fancy distributions like the other answers. I'd like to define an assumption that:
The case of the Nth image repeating a previous image occurred when there was a >=50% chance of it repeating.

This is then easily calculable as for there to be a 50% chance of it repeating, it needs to have surpassed half of the possible values. Therefore, in this case, the value of N is twice as much as had been before the repeat, which is $10 cdot 2 = 20$.

EDIT

As @Bass pointed out, I'm forgetting about the birthday problem here, to generalise for this situation we can use:

Where n(d) = 11, we can backtrack and calculate that d = 81.288, rounded to 81 days before a >50% chance of duplication.

edited 15 hours ago

answered 16 hours ago

AHKieran

5,0971040

edited 15 hours ago

answered 16 hours ago

AHKieran

5,0971040

answered 16 hours ago

AHKieran

5,0971040

answered 16 hours ago

AHKieran

5,0971040

3

$begingroup$
Aren't you kind of ignoring the birthday paradox here?
$endgroup$
– Bass
16 hours ago

$begingroup$
@Bass yes i was, edited to allow for that.
$endgroup$
– AHKieran
15 hours ago

add a comment |

3

$begingroup$
Aren't you kind of ignoring the birthday paradox here?
$endgroup$
– Bass
16 hours ago

$begingroup$
@Bass yes i was, edited to allow for that.
$endgroup$
– AHKieran
15 hours ago

Aren't you kind of ignoring the birthday paradox here?

– Bass
16 hours ago

@Bass yes i was, edited to allow for that.

– AHKieran
15 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Puzzling Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

BotbblZurVrKsytx JqEyIJOrTah90T 057OAkvdhRzSN jTwj0i6PM,IGEyxHuSyvymaZCW86lzyO,ppUz 2K9,lJjG lS5

搜尋此網誌

Ngtjyty