Tech Trivia: Linux: Unusually slow performance of grep/sort/other text processing commands

The grep manual does provide a subtle hint but it’s something that can be easily missed by the english speaking community (since they don’t need to use MBS – Multi Byte Strings – so often).

The multibyte setting can have a disastrous impact on the performance of text processing utilities that make use of the operating system’s built in regular expression processing capabilities. In plane english (and to cut the not-so-crappy details behind it), it depends a lot on the environment variables LC_* (in regex terms, read as “the common environment variable starting with LC_).

If you are sure that you don’t need multibyte processing in the processes you are running, just set LC_ALL=C (or LC_ALL=POSIX) and then run the grep/sort/text processing command that you want. This should do the trick.

And if you do need multibyte processing, well…life isn’t half as rosy, or so it seems as the bug to fix this in grep (at least) is still open!

For those who want to dig deeper, here’s a thread that can be of help. There’s a bug against the GNU Grep on this.

Advertisements

Leave a comment

No comments yet.

Comments RSS TrackBack Identifier URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

  • Shared links

  • Calendar

    • March 2010
      M T W T F S S
           
      1234567
      891011121314
      15161718192021
      22232425262728
      293031  
  • Search