神刀安全网

C++ in Competitive Programming: I/O

Welcome back to my series about Competitive Programming. Here is the introduction in case you missed it.

In this post I’ll explain some common idioms to deal with input and output.

In C++ a simple task like reading an integer from the standard input can be done in different ways: using streams, C functions or OS-dependant calls. The streams model offers a pretty high level interface, but it is generally slower than using native operating system calls. However, in different cases, it is acceptable.

I have solved a lot of challenges and very rarely I had to switch to C functions (e.g. scanf) or turn off the synchronization between C++ streams and standard C streams after each input/output operation (by using std::ios_base::sync_with_stdio ). Most of the time the I/O is not the point of the exercise then we can use the convenient streams model. This point seems irrelevant but it brings about simplification, enabling us to write not only simpler but also safer idioms. We’ll see that in some cases the streams interface is enough for basic parsing as well.

Due to the nature of the challenges – generally not focusing on I/O – I remark that the following idioms are pragmatic : they “work here”, but they may not fit a production environment where streams are often avoided like the plague. As I have read in a comment on reddit : “I fear that it may lead some people to prefer quick and functional solutions that work for some cases, neglecting important traits such as scalability, performance or cross-platform compatibility”. I think that solving challenges is about prototyping a solution which has to cover some sensible scenarios cleverly set up by challenge moderators. “Production” is another business. Definitely. Requirements evolve, scalability matters, etc. Being well-versed in Competitive Programming is not enough, sure, but I think that solving challenges is an entertaining way to face new problems that maybe – at some point – you’ll deal with for “real life purposes”.

I repeat a point from the introduction that I consider the most important: competitive programming is guaranteed brain excercise : “on any given day, what you’re asked to do at work may or may not be challenging, or you may be exercising one of the many non-technical skills required of software developers. By practicing regularly with competitive programming problems, you can ensure that the coding part of your brain receives a regular workout. It’s like one of those brain training apps, but for real skills”. In addition to this, other eleven reasons are wisely explained in this nice article which I recommend again.

After this short additional introduction, let me show some code about I/O.

Just for completeness, in the simplest cases you only need to read lonely values like numbers or strings:

int N, M; cin >> N >> M;

Remember in these challenges you don’t need to validate the input (unless stated otherwise – never found).

In one of the most common scenarios you need to read a sequence of numbers, and you can do it by using the synergetic copy_n + istream_iterator + back_inserter trio. That’s our first idiom. Generally the length of the sequence is passed just before (and the numbers are separated by whitespaces):

reserve prepares the vector to host “length” elements – it’s possibly allocating memory. A note: I’d really prefer writing something like vector<int> sequence(reserve, length) where “reserve” here is just a tag. The same applies for resize, etc. And this would avoid initializer lists overload predominance:

I used copy_n instead of copy because not only is it clearer, but also convenient in case of more input to read (if I used  copy then I would need to pass an end-of-stream iterator, but passing istream_iterator<int>() is dangerous because copy goes on iterating until the stream gets invalid).

With ranges the code streamlines:

boundedreturns a range of exactly length elements, in this case from the standard input ( view::take is another possibility).

Since a default operator>> for vector is not defined, reading a matrix needs manual iteration (here using copy_n is even more convenient):

Remember we don’t want to create a library, because it couldn’t be used to solve challenges. For this reason we don’t introduce an operator>> for vector.

Other cases involve strings, not just words but also lines. Pay attention to getline: as I have recently blogged (for another reason, but the lesson learned was the same), getline is an unformatted function. This means it does not ignore leading delimiters ( newline by default). These innocent lines can lead to something you may don’t expect:

int N; string line;  cin >> N; getline(cin, line);

The problem here is that we want to ignore the separator between the int and the line. E.g.:

10'/n' a line representing some data

Since getline does not skip the leading ‘/n’, it will stop immediately, writing nothing into the string.

This problem is easy to solve by passing std::ws (a manipulator which discards leading whitespaces from an input stream) :

This succint version is valid too:

And here is how the ranges join the party:

In another recurring pattern the input appears in this form:

N T a1 a2 ... aN t1 t2 ... tT

N is the length of a sequence of numbers, T is the number of test cases. Generally this kind of challenges require you to operate on the sequence with some kind of clever pre-processing and to use each test-case value (or values) to perform an operation quickly. For example, for each test-case value t, output the sum of the first t elements in the sequence.

Here is the simplest code to use for reading inputs in this scenario:

We’ll meet again this problem later in this series.

More complex input patterns can be handled by combining the examples above.

Printing answers

Just a few lines on output. Obviously, by using std::cout and some convenient standard abstractions like ostream_iterator .

Often the answers are just one number or two. In this case, just send them to the standard output applying the required formatting rules (e.g. space separated):

I generally don’t use endl  because it causes a stream flush and this may negatively affect the performance. For this reason I use just “/n” out of habit. In certain situations a printf is probably shorter, but it’s still error prone: not only if you write a wrong format, but also if you forget to update it in case you change a type (e.g. imagine you solve a problem by using int and then you change it to unsigned long long). With streams you don’t need to change anything because they automatically pick the correct operator<< overload.

When you need to print a sequence – like a vector – just use copy + ostream_iterator :

Note that the last separator is printed after the back element. This means extra care is needed to avoid it. For example:

Maybe in C++17 we’ll use the trailblazing ostream_joiner to save the extra line – since the “last” separator is not outputted at all:

Another worthwhile scenario is when you need to print floating point numbers and the output is expected to be an approximation of a certain number of digits. fixed and setprecision are good friends in this case. For example:

will print num1 and num2 with exactly two digits after the comma:

will print:

1.00 1.24

In case you need to print in a loop, it’s a good habit to set stream manipulators once, before the iteration:

I’ll be back on some standard mathematical utilities (e.g. rounding) later in this series.

Pushing streams to the limit

Before concluding, it’s worth sharing a couple of “beyond-the-basics” examples. Sometimes just configuring streams options is enough for solving some problems.

The first challenge required to tokenize a string based on given delimiters and to print the obtained tokens to the standard output, one per line. In other languages like Java you could use a StringTokenizer – indeed lots of Java guys use this opportunity on CC websites. Note that more complex challenges where parsing is the main topic does not allow you to adoperate standard things like streams or tokenizers (in other languages) – and they won’t be as efficient as the challenge requires!

This solution is not intended for production.So please don’t post comments like “streams are slow” or “who uses streams to tokenize?”. Here we have a different scenario. This code can be easily plugged into challenges requiring some basic parsing. By default streams skip whitespaces, here we need to skip also other delimiters.

Ranges library provides views like split and delimit for this kind of things:

Anyhow, let me go back to my modest C++14 empty sheet on HackerRank, I have only 30′ to complete the contest and Java boys are submitting two-line solutions…should I roll my own mini-parser?

C++ has a (a bit old-fashioned?) concept called facet to handle specific localization aspects. Basically, each facet corresponds to a precise localization setting that can be tuned. For example, the  ctype facet encapsulates character classification features – it can answer questions like “does this char is a space?” or “does this char is a digit?”. One can inherit from a base facet and change default behavior.

For our task we can inherit from ctype<char>, a specialization of std::ctype which encapsulates character classification features for type char . Unlike general-purpose std::ctype , which uses virtual functions, this specialization uses table lookup to classify characters (which is generally faster). Here is a solution to the challenge:

This requires a bit of clarification if you don’t know facets at all: ctype<char> uses a table to associate each char to its “classification” (this table is, indeed, called classification table ). The table size is standard ( ctype<char>::table_size – at least 256). We just need to set char delimiters to ctype_base::space . All std::istream formatted input functions are required to use std::ctype<char> for character classing during input parsing. For example, istream::operator>> asks to  ctype::is if a char is a space (aka a delimiter). Under the hood ctype<char> lookups the internal table.

Don’t worry about the lonely custom_delims allocation, the locale guarantees that the new facet will be deleted as soon as it’s not referenced anymore (facets are reference counted – another possible performance penalty, in addition to virtual function calls).

Although I never use such approach for parsing in my production code, in Competitive Programming it may be acceptable. For example, on HackerRank I submitted this code against a test-case requiring to split a string of 400’000 chars and it took (output included) 0.05s – within the time requirement of that challenge. And this code is easily reusable.

I promised two examples. The other was about number punctuation : given an integer number, print the string representation of the number, comma separated, supporting different comma styles – e.g. US 1000000 is 1,000,000, instead Indian 1000000 becomes 10,00,000. Another time, we may use the standard localization support:

The code is quite self-explanatory. For more details I encourage you to have a look at numpunct facet .

Hope you enjoyed this post because sometimes stream capabilities suffice in such competitions – consider them before rolling your own alternatives.

Summary

A brief recap of some weapons to add to our “pragmatic C++ competitive programmer toolkit”:

  • Usually I/O is not the point of the exercises, so  using streams is fine
  • To read a sequence of values into a vector use:
    • vector:: reserve (N), and
    • copy_n + istream_iterator + back_inserter
  • To print sequences use copy + ostream_iterator
  • To print floating points rounded to the nth-digit use fixed << setprecision(n)
  • If parsing is not the point of the challenge, consider  localization features

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » C++ in Competitive Programming: I/O

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮