Monday, January 15, 2018

An Um_nik week

This week's contests shared one of the problemsetters — and the winner. On Monday, Codeforces hosted its Hello 2018 round (problems, results, top 5 on the left, analysis). With the hardest problem having too much piecewise polynomial integration, the competition focused on the remaining seven. Only Um_nik could get them all, and even with 24 minutes to spare — congratulations!

On Sunday, AtCoder Grand Contest 020 started the race to Japan (problems, results, top 5 on the left, my screencast, analysis). Once again the hardest problem by tourist was a tiny bit too hard (and I even solved it after the contest by integrating polynomials, probably inspired by the analysis of the previous contest), and once again Um_nik was way faster than everybody else on the remaining problems — big congratulations again!

I found problem C quite cute. You are given n<=2000 positive integers, each up to 2000. Consider all 2n-1 nonempty subsets, and compute the sum of the numbers in each subset. Now sort those sums, and find the median sum — the 2n-1-th in the sorted order. You need to output this median sum. Time limit is 2 seconds.

In my last post I've discussed what was arguably the hardest contest problem of 2017: you are given an undirected graph with at most 200000 vertices and edges. Your goal is to split its vertices into four disjoint groups A, B, C, D in such a way that:
  • The subgraph formed by groups A+B is connected and has at least 2 vertices.
  • The subgraph formed by groups C+D is connected and has at least 2 vertices.
  • Each vertex in A has an edge to each vertex in C.
  • There are no other edges between subgraphs A+B and C+D.
If there are many possible solutions, you need to output any one of them.

Thanks to great ideas and pointers by Swistakk, mnbvmar and lewin, we were able to put together a O(n*logn) solution — you can check it out in this comment thread.

Thanks for reading, and check back next week!

Monday, January 8, 2018

An Otherland week

The first week of 2018 was quite calm, with the most devoted taking stabs at the still-ongoing Prime New Year contest by snarknews. Only one problem remains unsolved there, and it looks like it might take some team effort to crack it. I think it's within the spirit of the contest to discuss its problems publicly (after all, many even have analysis posted online), so I propose to do just that. Please skip the following paragraphs if you don't want spoilers!

The problem goes like this: you are given an undirected graph with at most 200000 vertices and edges. Your goal is to split its vertices into four disjoint groups A, B, C, D in such a way that:
  • The subgraph formed by groups A+B is connected and has at least 2 vertices.
  • The subgraph formed by groups C+D is connected and has at least 2 vertices.
  • Each vertex in A has an edge to each vertex in C.
  • There are no other edges between subgraphs A+B and C+D.
If there are many possible solutions, you need to output any one of them.

I have been thinking about the problem for some time, but only have the following not very helpful observations:
  • If the graph has an articulation point, then we can take this point as set A, any of the components it separates as set B, and the rest as C+D. So we can assume the graph is vertex-biconnected.
  • The answer is not always "Yes" - the smallest counterexample is the graph with 4 vertices and 5 edges. This example also shows that if a biconnected graph has a vertex of degree 2, then this vertex and both adjacent vertices must be in the same half of our splitting, since both ends of a splitting edge must have degree of at least 3 (two splitting edges plus one inside the half).
  • There are also counterexamples without vertices of degree 2.
  • There might be some hashing trick involved. I.e., for example suppose we assign a random number ai to each vertex, and compute for each vertex the value sum(ai-aj), where the summation goes over all adjacent vertices j. Then if we add up those values within one half of the graph (A+B in the above terminology), then almost everything will cancel out, leaving us |C|*aA-|A|*aC, where aA is the sum of ain A. But where to go from here?..
Please share your ideas!

In my last post, I have held a vote for the best problem of 2017. And the winner is this AtCoder problem by maroonrk: you are given two rooted trees on the same set of 105 vertices. You need to assign integer weights to all vertices in such a way that for each subtree of each of the trees the sum of weights is either 1 or -1, or report that it's impossible. The solution is explained in this post. Congratulations to the problemsetter and to AtCoder, and thanks to all problemsetters of 2017!

Thanks for reading, and check back next week.

Sunday, December 31, 2017

Best problems of 2017

I have reviewed all problems I've highlighted this year in my blog, and also the problems sent by the readers in response to my question — thanks a lot for your input! I've distilled everything to a shortlist of five problems (in chronological order):

  • The Open Cup problem about interactively proving the pigeonhole principle, by tunyash, with solution in this post.
  • The AtCoder problem about moving two chips on a subset of a DAG, by sugim48, with solution in this post.
  • The IPSC problem about picking optimal sequences of red and black cards to get the first match, by misof, with solution in this post.
  • The AtCoder problem about assigning numbers to shared vertices of two trees so that all subtrees sum to +-1, by maroonrk, with solution in this post.
  • The Open Cup problem about interactively exploring a maze on Mars with a communication delay, by Gassa, discussed in this post

What is the best problem of 2017?

As usual around New Year, snarknews is hosting the Prime New Year contest, which consists of problems given at top-level team competitions in 2017 but not solved by anybody there. If you like real challenge and don't have much to do during the holidays, here's your link :)

Guten Rutsch!

A sparse week

This week had the now traditional last contest of the year: Good Bye 2017 at Codeforces (problems, results, top 5 on the left, analysis, my screencast). The hardest problem H, after removing some layers, ended up being about finding a proper vertex coloring for a graph with n<=23 vertices with the smallest number of colors. My first thought was: surely for such classical problem the best approaches are well-known? However, it turned out that the algorithms I could remember were a O(3n) subset dynamic programming and various heuristics to speed up backtracking, both of which seemed too slow. Thanks to this paper I have now learned how to do this in O(n*2n), see point 4.2 "Inclusion-exclusion". Always nice to discover better approaches to classical problems!

In my previous summary, I have mentioned a TopCoder problem: you are given n=2000000 tasks, to be done in m=2000 days, at most one task per day (so most of the tasks will end up not being done). For each task you know three numbers: the segment of days when it can be done (from li to ri), and the profit ci from doing it. You need to pick the task to be done in each day in such a way that the total profit is maximized.

This can be naturally viewed as a weighted maximum matching problem, which thanks to the transversal matroid can be solved by trying to add tasks to the matching in decreasing order of profit, trying to find the augmenting path each time, and only resetting the "visited" state of each vertex once the matching is increased (this is called "Kuhn's algorithm" in the Russian-speaking community, but I couldn't find an authoritative link for it). Since the matching is only increased m times, we process each edge of the graph at most m times. The problem is that we have O(n*m) edges, so the total running time seems to be O(n*m2). However, we can notice that for each task that we end up not adding to the matching we process its edges only once (since we will never visit it in subsequent iterations), so the number of actually revisited edges is O(m2), and the running time is only O(n*m+m3), which is of course still too slow.

But now we can use another relatively standard trick: let's reduce the number of "active" edges from O(m2) to O(m*logm) in the following manner: instead of a matching problem we'll now have a flow problem, and we'll add auxiliary vertices between left and right parts. The vertices will correspond to segments. First, we'll pick the middle point b=m/2, and add vertices corresponding to segments [1,b], [2,b], ..., [b-1,b], [b,b], and infinite capacity edges in this manner: [1,b]->[2,b]->...->[b,b]; [1,b]->1; [2,b]->2; ..., [b,b]->b. The idea is that if we add an edge from some vertex in the left part to a segment [a,b], then the flow can then go to any vertex in the right part between a and b using those infinite edges, and only to those vertices. We handle the other half in a similar way: [b+1,b+1]<-[b+1,b+2]<-...<-[b+1,m-1]<-[b+1,m]. These two sets of segments already allow to handle any task that can be done in days b and b+1 using only two edges: to [li,b] and to [b+1,ri].

We can then apply the same procedure recursively to segments [1,b] and [b+1,m]: for example, we'll pick the middle point c=b/2, and build the auxiliary vertices [1,c]->[2,c]->...->[c,c] and [c+1,c+1]<-[c+1,c+2]<-...<-[c+1,b], and so on. Overall we'll have logm sets of m auxiliary vertices each (one per level of recursion), with two outgoing edges for each vertex, and every task can now be handled with at most two edges to auxiliary vertices instead of O(m) edges to the right part. This approach resembles the centroid decomposition of a tree when applied to a chain; we could also have used the sparse table to achieve the same result.

Now our graph has O(mlogm) active edges, and additional O(n) edges that are visited only once, so the total running time is O(n+m2*logm), which is fast enough.

That's it for the contests of 2017 — thanks for reading, and check back tomorrow (hopefully) for the best problems of 2017!

Saturday, December 30, 2017

A quadratic week

Last week had a relatively high density of contests, so much that two of them collided: Codeforces Round 453 and TopCoder SRM 726 took place at the same time on Tuesday. The Codeforces round started 25 minutes earlier, so we'll begin with it (problems, results, top 5 on the left, analysis). dotorya had chosen a seemingly suboptimal order to solve the first four problems, as A and B required disproportionately much time relative to their value, but he was faster than the competition and thus still ended up in first place — well done!

TopCoder SRM 726 at roughly the same time attracted the second half of the community (problems, results, top 5 on the left, my screencast). The hard problem has proved decisive: you are given 2000000 tasks, to be done in 2000 days, at most one task per day (so most of the tasks will end up not being done). For each task you know three numbers: the segment of days when it can be done (from li to ri), and the profit ci from doing it. You need to pick the task to be done in each day in such a way that the total profit is maximized. This is naturally reduced to a weighted maximum matching problem, but surely it will time out with 2 million vertices in the graph — or will it?

Codeforces ran another round, Round 454, on Saturday (problems, results, top 5 on the left, analysis). All problems were tractable this time for moejy0viiiiiv and Radewoosh, and in case of the former with 30 minutes to spare. Congratulations on the victory!

In my previous summary, I have mentioned an Open Cup problem: you are given an integer n with at most a million digits. You need to determine if it's a perfect square or not. You don't need to actually find the square root.

As is often the case in problems with a Yes/No answer, it suffices to find an algorithm that always finds one of the two answers correctly, and the other with a certain probability that is greater than zero. We can then apply it repeatedly until the probability of not finding the correct answer is exponentiated to become essentially zero.

Let's pick a prime number p. If n is a perfect square, then n mod p is always a quadratic residue. If not, then assuming p is picked randomly the probability of n mod p being a quadratic residue is roughly 0.5, since roughly half of all numbers mod p are quadratic residues (this is merely handwaving; can anyone prove this probability estimation formally?).

So we can pick, say, 30 random prime numbers, and check if n is a quadratic residue modulo each of them. If at least one says no, then the overall answer is no, otherwise it's yes. Checking if a number is a quadratic residue modulo p is done by checking if n(p-1)/2=1 (mod p).

Thank for reading, and check back for this week's summary!

A Newton week

The Dec 11 - Dec 17 week contained the last Open Cup round of the year: the Grand Prix of Peterhof (problems, results, top 5 on the left). The deciding problem H required the ability to find the exponent a formal series fast, and the teams that were able to do so claimed the first two places — congratulations to the Moscow IPT team and to SPb Havka-papstvo!

I found the relatively easy problem J very nice. You are given an integer with at most a million digits. You need to determine if it's a perfect square or not. You don't need to actually find the square root. Can you see how to get this problem accepted on the 8th minute of the contest?

Thanks for reading, and check back soon!

A correlation week

The Dec 4 - Dec 10 week was another of the calm ones, so let's use this post to discuss the interactive NEERC problem I mentioned in the previous one: you work with a device that takes an a as input and computes ad mod n using the following pseudocode:

1   modPow(a, d, n) {
2     r = 1;
3     for (i = 0; i < 60; ++i) {
4       if ((d & (1 << i)) != 0) {
5         r = r * a % n;
6       }
7       a = a * a % n;
8     }
9   }

However, instead of receiving the result of the computation, you only receive the time it took the device to execute it, which is computed in the following way: only the multiplications on lines 5 and 7 take nonzero time, and multiplying x by y modulo n takes (bits(x)+1)*(bits(y)+1), where bits(x) is the length of the binary representation of x without leading zeroes.

You know the number n but not the number d, and your goal is to find d after sending at most 30000 requests to the device. You know that n and d were chosen in the following way: first, two prime numbers p and with bits(p)=bits(q)=30 were picked independently and uniformly at random. Then n was computed as p*q. Then the number m=(p-1)*(q-1) was computed. Then d was picked uniformly at random between 1 and m-1 inclusive, such that it is coprime with m.

The official analysis has a pretty detailed description of the intended solution starting from page 8, so let me just recap it here briefly;
  • We're going to just send 30000 random queries.
  • Lowest bit of d is always 1, let's determine the second lowest.
  • If it's also 1, then we multiply a by a2, otherwise we don't.
  • For queries where a and/or ahave less bits than usual, the overall modPow time will also be less than usual, but only on average, so we can't just tell from one a.
  • However, the correlation between the time to multiply a by a2 and the overall modPow time over all 30000 queries can tell us this bit with extremely high certainty.
  • We can then determine the next bit in the same manner, and so on.
  • This Wikipedia page also describes the same idea.
This solution is very easy to code and generalizes well to bigger constraints, but it might be hard to intuitively feel if 30000 queries gives enough certainty. Egor has suggested another approach which is a bit harder to code but easier to believe that it works (but we have not actually tried to implement it): instead of sending random queries, we will send queries such that one repeated square of them is very small modulo n, say, at most 10 bits. Such a query will give much more signal on whether this repeated square is used in the multiplications, since multiplying 10 by 60 bits, compared to multiplying 60 by 60 bits, is about 3000 nanoseconds faster, and standard deviation of modPow computation time if we just feed it random inputs is on the order of 1700, so the middle point is roughly one standard deviation away from each expectation. If we try, say, 49 such numbers for every bit, then we'll have seven standard deviations and will always guess correctly, so we need only 60*49 which is about 3000 queries to find all bits.

Of course, finding a number that has few bits in a certain power modulo n is a hard task in itself, but here n is small enough that we can factorize it using Pollard's rho algorithm, and after that we just need to invert the said power modulo phi(n) to be able to find the root of this power of any number, including ones with few bits.

Thanks for reading, and check back for more!