0% found this document useful (0 votes)

4 views12 pages

10 Command-line Tools for Data Analysis in Linux _ Opensource.com

The document discusses ten command-line tools for data analysis in Linux, emphasizing their efficiency and power compared to traditional spreadsheet applications. It introduces tools like head, tail, wc, grep, tr, sort, sed, cut, uniq, and awk, providing examples of how they can be used for various data manipulation tasks. The author encourages users to explore these tools further, highlighting their capability to handle large datasets effectively.

Uploaded by

RaimondsPantelis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

10 Command-line Tools for Data Analysis in Linux _ Opensource.com

Uploaded by

RaimondsPantelis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

2/5/2020 10 command-line tools for data analysis in Linux | Opensource.

com

10 command-line tools for data

analysis in Linux
Why load everything into a spreadsheet when the terminal
can be faster, more powerful, and more easily scriptable?
23 Feb 2017 Jason Baker (Red Hat) 462 11 comments

Image by : Opensource.com
So you've landed on some data you want to analyze.
Where do you begin?

Many people used to working in a graphical environment might default to using

a spreadsheet tool, but there's another way that might prove to be faster and
more efficient, with just a little more effort. And you don't need to become an
expert in a statistical modeling language or a big data toolset to take
advantage of these tools.

You can learn a lot about a dataset without ever leaving your
terminal.

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 1/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

I'm talking about the Linux command line. Just using some tools that you've
probably already got installed on your computer, you can learn a lot about a
dataset without ever leaving your terminal. Long-time Linux users will of course
laugh—they've been using many of these tools for years to parse logs and
understand configuration tools. But for the Linux newcomer, the revelation that
you've got a whole data analysis toolkit already at your fingertips can be a
welcomed surprise.

Most of these tools aren't strictly speaking limited to Linux, either. Most
hearken back to the days of Unix, and users of other Unix-like operating
systems likely have them installed already or can do so with ease. Many are a
part of the GNU Coreutils package, while a few are individually maintained,
and with some work, you can even use them on Windows.

More Linux resources

Linux commands cheat sheet
Advanced Linux commands cheat sheet
Linux networking cheat sheet
SELinux cheat sheet
Linux common commands cheat sheet
What are Linux containers?
Red Hat Enterprise Linux Technical Overview
Our latest Linux articles

So let's try out a few of the many simple open source tools for data analysis
and see how they work! If you'd like to follow along with these examples, go
ahead and download this sample data file, from GitHub, which is a CSV
(comma separated value) list of articles we published to Opensource.com in
January.

head and tail

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 2/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

First, let's get started by getting a handle on the file. What's in it? What does its
format look like? You can use the cat command to display a file in the terminal,
but that's not going to do us much good if you're working with files more than a
few dozen lines.

Enter head and tail. Both are utilities for showing a specified number of lines
from the top or bottom of the file. If you don't specify the number of lines you
want to see, you'll get 10. Let's try it with our file.

$ tail -n 3 jan2017articles.csv
02 Jan 2017,Article,Scott Nesbitt,3 tips for effectively using wikis for documenta
02 Jan 2017,Article,Jen Wike Huger,The Opensource.com preview for January,0,/artic
02 Jan 2017,Poll,Jason Baker,What is your open source New Year's resolution?,1,/po

Looking at those last three lines, I can pick out a date, author name, title, and a
few other pieces of information immediately. But I don't know what every
column is. Let's look at the top of the file and see if it has headers to explain
what each column means:

$ head -n 1 jan2017articles.csv
Post date,Content type,Author,Title,Comment count,Path,Tags,Word count

Okay, that all makes sense now. Looks like we've got a list of articles with the
date they were published, the type of content for each one, the author, title,
number of comments, relative URL, the tags each article has, and the word
count.

That's great, but how big is this file? Are we talking about dozens of articles we
want to analyze, or hundreds, or even thousands? The wc command can help
with that. Short for "word count," wc can count the number of bytes,
characters, words, or lines in the file. In our case, we want to know the number
of lines.

$ wc -l jan2017articles.csv
93 jan2017articles.csv

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 3/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

And, there it is. 93 lines in this file; since we know the first row contained
headers, we can surmise that this is a list of 92 articles.

grep

Okay, now let's ask ourselves: Out of these 92 articles, how many of them
were about a security topic? For our purposes, let's say we're interested in
articles that mention security anywhere in the entry, whether in the title, the list
of tags, or somewhere else. The grep tool can help us with that. With grep,
you can search a file or other input for a particular pattern of
characters. grep is an incredibly powerful tool, thanks to the regular
expressions you can build to match very precise patterns. But for now, let's just
search for a simple string.

$ grep -i "security" jan2017articles.csv

30 Jan 2017,Article,Tiberius Hefflin,4 ways to improve your security online right
28 Jan 2017,Article,Subhashish Panigrahi,How communities in India support privacy
27 Jan 2017,Article,Alan Smithee,Data Privacy Day 2017: Solutions for everyday pri
04 Jan 2017,Article,Daniel J Walsh,50 ways to avoid getting hacked in 2017,14,/art

The format we used was grep, followed by the -i flag (which tells grep not to
be case sensitive), followed by the pattern we wanted to search for, and then
the file in which we were searching. It looks like we had four security-related
articles last month. But let's imagine we got a much longer list than we could
easily count. Using a pipe, we could combine grep with the wc command we
just learned about above, to get a count of the total lines mentioning security.

$ grep -i "security" jan2017articles.csv | wc -l

In this case, wc took the output of our grep command, and used it as its input,
without ever having to worry about saving it anywhere first. This is why piping
input and output, in particular when combined with a little shell scripting, makes
the terminal such a powerful tool for data analysis.

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 4/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

A CSV file is a pretty helpful format for many analysis scenarios, but what if
you need to convert the file to a different format for use in a different
application? Maybe you need tab separators instead of commas, or maybe you
want to change them to some HTML so that you can use the data output in a
table. The tr command can help you with that, by translating from one type of
character to another. Like the other examples, you can also pipe input and
output to this command.

Let's try another multi-part example, by creating a TSV (tab separated values)
file with just the articles that published on January 20.

$ grep "20 Jan 2017" jan2017articles.csv | tr ',' '\t' > jan20only.tsv

What's going on here? First, we searched for the date in question, using grep.
We piped this output to the tr command, which we used to replace the
commas with tabs (denoted with '\t'). But where did it go? Well, the > character
redirected the output to our new file instead of the screen. All of this work in
one command sequence. We can then verify that the jan20only.tsv file contains
the data that we expected.

$ cat jan20only.tsv
20 Jan 2017 Article Kushal Das 5 ways to expand your project's contributo
20 Jan 2017 Article D Ruth Bavousett How to write web apps in R with Sh
20 Jan 2017 Article Jason Baker "Top 5: Shell scripting the Cinnamon Linu
20 Jan 2017 Article Tracy Miranda How is your community promoting diversity?

sort

What if we wanted to learn more details about a particular column? Which

article in our new list of articles is the longest? Let's build on our last example.
Now that we have a list of articles from just January 20, we can use
the sort command to sort by the word count column. Of course, we don't
strictly speaking need an intermediate file here; we could have piped the
output of the last command instead. But sometimes it's simply easier to break
long steps into smaller ones rather than creating gigantic chains of commands.

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 5/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

$ sort -nr -t$'\t' -k8 jan20only.tsv | head -n 1

20 Jan 2017 Article Tracy Miranda How is your community promoting diversity?

This is another long example, so let's break down what's happening. FIrst,
we're using the sort command to sort by the number of words. The -nr option
tells sortto do a numeric sort, and to return the results in reverse order (largest
to smallest). The next -t$'\t' tells sort that the delimiter is the tab ('\t'). (You can
read why you need the dollar sign here; in short, it's to tell the shell that this is
a string that needs processing to turn the \n into an actual tab). The -k8 portion
of the command tells sort to use the eighth column, which is the column for
word count in our example.

Finally, the whole output is piped to head with instructions just to show the top
line, which is our result, the article from this file with the highest word count.

sed

You might want to select specific lines of a file. sed, short for stream editor, is
one way to do this. What if you wanted to combine multiple files that all had
headers? You would only want one set of headers to appear for the whole file,
so you would need a way to scrape out the extras. Or what if you wanted to
grab only a particular range of lines? sed is your tool. It's also a great way to
do a bulk find and replace a file.

Let's create a new file with no headers from our list of articles, suitable for
combining with other files (if, for example, I had a different file for every month
and wanted to put them together).

$ sed '1 d' jan2017articles.csv > jan17no_headers.csv

The '1 d' option tells sed to delete the first line. sed is far more powerful than
this, and I'd recommend reading up further on its replacement powers.

cut

What if, instead of wanting to remove a row, I wanted to remove a column?

What if I wanted to pick out just one column? Let's create a new list of authors
https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 6/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

for our example we built above.

$ cut -d',' -f3 jan17no_headers.csv > authors.txt

In this simple example, we told cut with -d',' that this is a comma-delimited file,
that we wanted the third column (-f3), and to send the output to a new file
called authors.txt.

uniq

That last example left us with a list of authors, but, how many unique authors
are on that list? How many articles did each author write? Enter uniq.
With uniq, you can easily find out. Let's sort the file, find uniques, then outputs
a file that has a count of the number of articles written by each author.

sort authors.txt | uniq -c > authors-sorted.txt

Glimpsing at the file, we can now see how many articles each author had. Let's
just look at the last three lines to be sure it worked.

$ tail -n3 authors-sorted.txt

1 Tracy Miranda
1 Veer Muchandi
3 VM (Vicky) Brasseur

awk

Let's look at one more tool in our command-line data analysis toolbelt
today, awk. awk is another one of those tools that I'm going to give far too little
credit to; it's really a powerhouse worth exploring on its own. It is another great
tool for replacement, but also much more. Let's go back to the TSV file we
made earlier of just the January 20 articles, and use that to create a new list of
just the authors of those articles, along with the number of words each author
wrote.

$ awk -F "\t" '{print $3 " " $NF}' jan20only.tsv

Kushal Das 690

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 7/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

D Ruth Bavousett 218

Jason Baker 214
Tracy Miranda 1007

What's going on here? The -F "\t" we pass to awk simply tells it that we're
working with tab-separated data. Within the braces, we're actually
telling awk to execute just a little bit of code. We're telling it to print the third
column with $3, and then the last column with $NF (the "number of fields"), and
place two spaces between them to make it a little more legible.

So what? Can't we do all of this faster in a spreadsheet, or just by looking at

the file in some cases? Sure we can! Now stop and imagine that instead of a
93 line file, we were working with a 93,000 or even one much larger. Can your
spreadsheet utility load it without crashing or slowing down significantly? Or
imagine instead of one file with one month's worth of articles, you had a
different file for every month of the past seven years. Suddenly, a spreadsheet
isn't the best option work processing your data, but you're not nearly in the
territory yet where you need a true big data tool to work with your dataset.

You could choose to load the files into a database tool and work with the data
there. But is that the right choice? It might be overkill. What if you're just
examining the data to get a sense of what it contains? With these simple tools
and a little scripting to recurse through a directory, you can work with large
amounts of data with ease. Professionals and amateurs alike who work with
data on a regular basis would do well to spend some time learning these and
other command line data analysis tools.

This introduction only scratches the surface of each of these tools. They are
far, far more powerful than these simple examples would let on, which is why
volumes of books have been written about most. I hope you'll take the time to
read a man page, do a little work in the search engine, or pick up a book and
learn more about this interesting suite of tools that you have ready at your
fingertips.

Topics Linux Command line

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 8/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

About the author

Jason Baker - I use technology to make the world more open. Linux desktop
enthusiast. Map/geospatial nerd. Raspberry Pi tinkerer. Data analysis and
visualization geek. Occasional coder. Cloud nativist. Civic tech and open
government booster.

More about me

Learn how you can contribute

Getting started with Managing your

GnuCash attached hardware on
Linux with systemd-
udevd

Send commands to Give an old MacBook

multiple SSH sessions new life with Linux
with Terminator

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 9/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

Managing processes What's your favorite

on Linux with kill and Linux distribution?
killall

11 Comments

Greg Pittman on 24 Feb 2017 1

I never get around to it, but I keep suggesting to myself that I need to make my own
references for these commands. I will find a use for one of them, spend some time sorting
out the syntax, translating the man pages to something that is human-readable, and do
something pretty cool. Unfortunately, if I haven't used them in a while, I forget the details,
and have to relearn.

Jason Baker on 24 Feb 2017 1

Remembering the exact argument format and sequence is definitely a challenge with
these tools; I do keep my own notes for exactly that purpose. For most of the tools
(excepting sed and awk, for sure), rather than wading through the man pages,
running them with --help will give you enough of an idea to eek out what you need.

Jasonthegreat on 24 Feb 2017 1

The only one I really don't use is tr. I find it easier to convert my files to XML using sed or
awk script which can be easily converted Spreadsheet XML 2003 using an XSLT with
custom formatting.

Rob Kellington on 25 Feb 2017 2

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 10/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

Thanks for this.

One comment ... the line "sort authors.txt | uniq -c > authors.txt" I think empties out the
authors.txt

Jason Baker on 27 Feb 2017 1

Good catch; I changed the output file to have a different name. Thanks!

Michael Mehlhorn on 28 Feb 2017 1

I use all of these tools. You can do crazy things with them. But in most cases it makes only
sense for batch work. You forgot PERL which includes all your mentioned tools possibilities.
There is even a PERL-modul to create Excel sheets. ;-)

andy on 28 Feb 2017 1

A common pipe line we use is:
grep /tmp/file | cut -d: -f1,2 | sort | uniq -c

e.g count the hits by an IP per minute:

grep 24.333.222.111 /var/log/httpd/access_log | cut -d: -f2,3 | sort | uniq -c
70 23:45
74 23:46
76 23:47
36 23:48

perl gives you all the power of sed and awk and grep combined and extended. It works very
well in a pipeline.

Edgar Fuentes on 28 Feb 2017 1

Jason, when you say "with just a little more effort", I think you must clarify that this means
learning the tools because when you become familiar with these, that translates into a lot
less effort. Another important note is that you could develop your own custom tools, at this
point you will be a ninja. Thanks!

Sarah Thornton on 02 Mar 2017 1

Great post :-) Another command line tool that's really handy is "Petit". It can hash a file
which is very useful when looking through log files where a datestamp may change on a line
and you want to group the events together. For example, if you're looking for all behavior on

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 11/12
2/5/2020 10 command-line tools for data analysis in Linux | Opensource.com

a url or an IP address, you can summarize the logs and get a clearer picture. I have found it
useful when I see a script hit a page multiple times and want a list of IP's for that one URL.

Handy little tool :-)

Shayne Riley on 07 Mar 2017

Two command-line I use quite often for data analysis are jq and q:
- jq is a "Command-line JSON processor". Sometimes I get lengthy and ugly JSON
responses from my curl commands. I'll pipe it into jq and strip out all the JSON parts I don't
want, and it's pretty-printed too.
- q allows me to run SQL-like queries on CSV files. I can even do joins! Sure, it's only useful
for people that know SQL queries, which isn't as many as I'd hope, but since I know how to
do them, it comes in handy.

https://opensource.com/article/17/2/command-line-tools-data-analysis-linux?intcmp=7016000000127cYAAQ 12/12

Linux Privilege Escalation
No ratings yet
Linux Privilege Escalation
42 pages
CEREC AC Omnicam: Installation Instructions For Windows 10 Performance Pack
100% (1)
CEREC AC Omnicam: Installation Instructions For Windows 10 Performance Pack
10 pages
Linux Interview Questions
No ratings yet
Linux Interview Questions
134 pages
Linux For Dummies Cheat Sheet
100% (1)
Linux For Dummies Cheat Sheet
4 pages
Machine Learning Cheat Sheet: 1. Hardware
No ratings yet
Machine Learning Cheat Sheet: 1. Hardware
14 pages
LinuxCommands CanadianUserGroup Summary
No ratings yet
LinuxCommands CanadianUserGroup Summary
4 pages
Lisa19 Maheshwari PDF
No ratings yet
Lisa19 Maheshwari PDF
89 pages
Module 2 Ethical Hacking
No ratings yet
Module 2 Ethical Hacking
10 pages
Windows and Linux Terminals & Command Lines
No ratings yet
Windows and Linux Terminals & Command Lines
2 pages
Unit 1
No ratings yet
Unit 1
80 pages
2 1 1 Linux-Commands
No ratings yet
2 1 1 Linux-Commands
87 pages
2 1 1 Linux-Commands
No ratings yet
2 1 1 Linux-Commands
86 pages
LINUX Privilege Escalation
No ratings yet
LINUX Privilege Escalation
5 pages
Linux 1 Samenvatting
No ratings yet
Linux 1 Samenvatting
14 pages
Google Code University: Home Tutorials and Introductions Courses
No ratings yet
Google Code University: Home Tutorials and Introductions Courses
20 pages
Introduction To Linux: Foundation Course
No ratings yet
Introduction To Linux: Foundation Course
51 pages
10 Command-Line Tools That Refuse To Die: 1: Ping
No ratings yet
10 Command-Line Tools That Refuse To Die: 1: Ping
3 pages
Shell Programming - DEV Community
No ratings yet
Shell Programming - DEV Community
11 pages
Linux_Commands
No ratings yet
Linux_Commands
7 pages
Linprivesc - PuckieStyle
No ratings yet
Linprivesc - PuckieStyle
26 pages
Hands-On Ethical Hacking and Network Defense: Linux Operating System Vulnerabilities
No ratings yet
Hands-On Ethical Hacking and Network Defense: Linux Operating System Vulnerabilities
40 pages
Linux Commands and Icc Commands 3
No ratings yet
Linux Commands and Icc Commands 3
53 pages
Linux
No ratings yet
Linux
154 pages
How To Look Like A UNIX Guru
No ratings yet
How To Look Like A UNIX Guru
14 pages
.This Commands Are Called: Filters
No ratings yet
.This Commands Are Called: Filters
20 pages
Shell and Unix Notes
No ratings yet
Shell and Unix Notes
28 pages
Linux Commandz Class 5
No ratings yet
Linux Commandz Class 5
26 pages
All Cmmnds1
No ratings yet
All Cmmnds1
7 pages
1 Using Linux Tools
No ratings yet
1 Using Linux Tools
83 pages
Day No. 1: What Is An OS
100% (1)
Day No. 1: What Is An OS
32 pages
Linux_Commands_Developer_Data_Engineer
No ratings yet
Linux_Commands_Developer_Data_Engineer
4 pages
Advanced CLI Commands You Should Know As A Developer
No ratings yet
Advanced CLI Commands You Should Know As A Developer
6 pages
Linux Commands Cheat Sheet
No ratings yet
Linux Commands Cheat Sheet
1 page
Linux Master Class Course Book
No ratings yet
Linux Master Class Course Book
35 pages
50 Most Frequently Used UNIX
No ratings yet
50 Most Frequently Used UNIX
17 pages
Lecture 1-Introduction
No ratings yet
Lecture 1-Introduction
20 pages
8 Tips For The Linux Command Line
No ratings yet
8 Tips For The Linux Command Line
5 pages
Shows The "Present Working Directory"
No ratings yet
Shows The "Present Working Directory"
7 pages
Lecture 1: Introduction, Basic UNIX: Advanced Programming Techniques Summer 2003
No ratings yet
Lecture 1: Introduction, Basic UNIX: Advanced Programming Techniques Summer 2003
98 pages
Basic Linux Commands
No ratings yet
Basic Linux Commands
18 pages
Guide To Linux File Command Mastery
No ratings yet
Guide To Linux File Command Mastery
8 pages
Redhat Admin 1
No ratings yet
Redhat Admin 1
1 page
Ubuntu all
No ratings yet
Ubuntu all
26 pages
GitHub - jlevy_the-art-of-command-line_ Master the command line, in one page
No ratings yet
GitHub - jlevy_the-art-of-command-line_ Master the command line, in one page
22 pages
Re: What Is Normalization Means..? Answer
No ratings yet
Re: What Is Normalization Means..? Answer
43 pages
Linux Introduction Binder PDF
No ratings yet
Linux Introduction Binder PDF
32 pages
Introduction To Linux
No ratings yet
Introduction To Linux
28 pages
Basic Linux Commands
No ratings yet
Basic Linux Commands
31 pages
LINUX CHEAT SHEET
No ratings yet
LINUX CHEAT SHEET
2 pages
77 Useful Linux Commands and Utilities
No ratings yet
77 Useful Linux Commands and Utilities
12 pages
Talk 2
No ratings yet
Talk 2
7 pages
Record
100% (1)
Record
187 pages
OS Lab 4 (Revised)
No ratings yet
OS Lab 4 (Revised)
7 pages
Unit 2 Resource Management in Linux
No ratings yet
Unit 2 Resource Management in Linux
72 pages
Ibm Linux Intro
No ratings yet
Ibm Linux Intro
6 pages
Lecture 3-Linux Commands
No ratings yet
Lecture 3-Linux Commands
34 pages
Rhcsa7 1497639628
No ratings yet
Rhcsa7 1497639628
22 pages
Module-2 & 3: Linux Commands
No ratings yet
Module-2 & 3: Linux Commands
28 pages
Working On The Chain Gang Use The Force: Linux Shell Survival Guide v2.3
No ratings yet
Working On The Chain Gang Use The Force: Linux Shell Survival Guide v2.3
2 pages
The Linux Command Line
No ratings yet
The Linux Command Line
17 pages
The Beginners Guide to VS Code
From Everand
The Beginners Guide to VS Code
Steven Mcananey
No ratings yet
Ip Project Anusram.r 1
No ratings yet
Ip Project Anusram.r 1
13 pages
Pecoff v8
No ratings yet
Pecoff v8
69 pages
ST-901 4PIN User Manual 246
No ratings yet
ST-901 4PIN User Manual 246
12 pages
XOM02 Vlocity Order Managment Overview EG v8.0.1
100% (1)
XOM02 Vlocity Order Managment Overview EG v8.0.1
36 pages
Cadac Group Eng Activation Code2
No ratings yet
Cadac Group Eng Activation Code2
4 pages
Email Extractor-FAQ
No ratings yet
Email Extractor-FAQ
14 pages
Biomerieux Vidas PC - User Manual
100% (1)
Biomerieux Vidas PC - User Manual
294 pages
Download
No ratings yet
Download
58 pages
Quantus 20.1 Plugnplay
No ratings yet
Quantus 20.1 Plugnplay
12 pages
I Want A Freeware Utility To ... 300+ Common Problems Solved
No ratings yet
I Want A Freeware Utility To ... 300+ Common Problems Solved
8 pages
ePowerMonitor ePM Datasheet-V.2023
No ratings yet
ePowerMonitor ePM Datasheet-V.2023
1 page
GC 2024 10 16
No ratings yet
GC 2024 10 16
21 pages
Powerpath Configuration Management
No ratings yet
Powerpath Configuration Management
127 pages
EMU48
No ratings yet
EMU48
9 pages
Reading Content From The File: Application 61: File Writing Demo
No ratings yet
Reading Content From The File: Application 61: File Writing Demo
200 pages
Practice PLSQL
No ratings yet
Practice PLSQL
52 pages
WORDPAD
No ratings yet
WORDPAD
2 pages
Creality Slicer User Manual_EN
No ratings yet
Creality Slicer User Manual_EN
33 pages
EJGK Analysis
No ratings yet
EJGK Analysis
4 pages
SPAMMING TUTORIAL Cading and Hacking Guid 2023
No ratings yet
SPAMMING TUTORIAL Cading and Hacking Guid 2023
25 pages
Backing Up and Restoring Nagios XI
No ratings yet
Backing Up and Restoring Nagios XI
15 pages
Sketchup Basic Guide
100% (9)
Sketchup Basic Guide
114 pages
Student Homework Tracker Template
100% (1)
Student Homework Tracker Template
8 pages
ALV
No ratings yet
ALV
5 pages
IE User's Manual DVR Dahua
No ratings yet
IE User's Manual DVR Dahua
10 pages
User Manual - PTRC - Original Return - Up To 31st March 2016
No ratings yet
User Manual - PTRC - Original Return - Up To 31st March 2016
19 pages
Comsats University Islamabad Wah Campus Object Oriented Programming
No ratings yet
Comsats University Islamabad Wah Campus Object Oriented Programming
7 pages
5 DTData Glove Series
No ratings yet
5 DTData Glove Series
2 pages
SAP Cloud Platform Integration Vs Boomi
No ratings yet
SAP Cloud Platform Integration Vs Boomi
3 pages

10 Command-line Tools for Data Analysis in Linux _ Opensource.com

Uploaded by

10 Command-line Tools for Data Analysis in Linux _ Opensource.com

Uploaded by

2/5/2020 10 command-line tools for data analysis in Linux | Opensource.

10 command-line tools for data

Many people used to working in a graphical environment might default to using

More Linux resources

head and tail

$ grep -i "security" jan2017articles.csv

$ grep -i "security" jan2017articles.csv | wc -l

$ grep "20 Jan 2017" jan2017articles.csv | tr ',' '\t' > jan20only.tsv

What if we wanted to learn more details about a particular column? Which

$ sort -nr -t$'\t' -k8 jan20only.tsv | head -n 1

$ sed '1 d' jan2017articles.csv > jan17no_headers.csv

What if, instead of wanting to remove a row, I wanted to remove a column?

for our example we built above.

$ cut -d',' -f3 jan17no_headers.csv > authors.txt

sort authors.txt | uniq -c > authors-sorted.txt

$ tail -n3 authors-sorted.txt

$ awk -F "\t" '{print $3 " " $NF}' jan20only.tsv

D Ruth Bavousett 218

So what? Can't we do all of this faster in a spreadsheet, or just by looking at

Topics Linux Command line

About the author

Learn how you can contribute

Getting started with Managing your

Send commands to Give an old MacBook

Managing processes What's your favorite

Greg Pittman on 24 Feb 2017 1

Jason Baker on 24 Feb 2017 1

Jasonthegreat on 24 Feb 2017 1

Rob Kellington on 25 Feb 2017 2

Thanks for this.

Jason Baker on 27 Feb 2017 1

Michael Mehlhorn on 28 Feb 2017 1

andy on 28 Feb 2017 1

e.g count the hits by an IP per minute:

Edgar Fuentes on 28 Feb 2017 1

Sarah Thornton on 02 Mar 2017 1

Handy little tool :-)

Shayne Riley on 07 Mar 2017

You might also like