100% found this document useful (1 vote)

127 views8 pages

031 Data Wrangling With Mongodb

Uploaded by

Nguyễn Đăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

127 views8 pages

031 Data Wrangling With Mongodb

Uploaded by

Nguyễn Đăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

031-data-wrangling-with-mongodb

April 23, 2022

3.1. Wrangling Data with MongoDB ��

[2]: from pprint import PrettyPrinter

import pandas as pd
from IPython.display import VimeoVideo
from pymongo import MongoClient

[3]: VimeoVideo("665412094", h="8334dfab2e", width=600)

[3]: <IPython.lib.display.VimeoVideo at 0x7fb54447b460>

[4]: VimeoVideo("665412135", h="dcff7ab83a", width=600)

[4]: <IPython.lib.display.VimeoVideo at 0x7fb54447b1c0>

Task 3.1.1: Instantiate a PrettyPrinter, and assign it to the variable pp.

• Construct a PrettyPrinter instance in pprint.
[5]: pp = PrettyPrinter(indent = 2)

1 Prepare Data
1.1 Connect
[6]: VimeoVideo("665412155", h="1ca0dd03d0", width=600)

[6]: <IPython.lib.display.VimeoVideo at 0x7fb54447ba30>

Task 3.1.2: Create a client that connects to the database running at localhost on port 27017.
• What’s a database client?
• What’s a database server?
• Create a client object for a MongoDB instance.
[7]: client = MongoClient(host="localhost", port=27017)

1
1.2 Explore
[8]: VimeoVideo("665412176", h="6fea7c6346", width=600)

[8]: <IPython.lib.display.VimeoVideo at 0x7fb54437fd60>

[9]: from sys import getsizeof

my_list = [0,1,2,3,4,5] #list/ array
my_range = range(0,8_000_000) #iterator

# for i in my_list:
# print(i)

# for i in my_range:
# print(i)

print(getsizeof(my_list))
print(getsizeof(my_range))

152
48
Task 3.1.3: Print a list of the databases available on client.
• What’s an iterator?
• List the databases of a server using PyMongo.
• Print output using pprint.
[10]: db_list = list(client.list_databases())
#print(getsizeof(db_list))
pp.pprint(db_list)

[ {'empty': False, 'name': 'admin', 'sizeOnDisk': 40960},

{'empty': False, 'name': 'air-quality', 'sizeOnDisk': 6987776},
{'empty': False, 'name': 'config', 'sizeOnDisk': 12288},
{'empty': False, 'name': 'local', 'sizeOnDisk': 73728}]

[11]: VimeoVideo("665412216", h="7d4027dc33", width=600)

[11]: <IPython.lib.display.VimeoVideo at 0x7fb542b112b0>

Task 3.1.4: Assign the "air-quality" database to the variable db.

• What’s a MongoDB database?
• Access a database using PyMongo.
[12]: db = client["air-quality"]

[13]: VimeoVideo("665412231", h="89c546b00f", width=600)

2
[13]: <IPython.lib.display.VimeoVideo at 0x7fb542b118b0>

Task 3.1.5: Use the list_collections method to print a list of the collections available in db.
• What’s a MongoDB collection?
• List the collections in a database using PyMongo.
[14]: #list(db.list_collections())[0]
for c in db.list_collections():
print(c["name"])

lagos
system.buckets.lagos
nairobi
system.buckets.nairobi
system.views
dar-es-salaam
system.buckets.dar-es-salaam

[15]: VimeoVideo("665412252", h="bff2abbdc0", width=600)

[15]: <IPython.lib.display.VimeoVideo at 0x7fb542b11a30>

Task 3.1.6: Assign the "nairobi" collection in db to the variable name nairobi.
• Access a collection in a database using PyMongo.
[16]: nairobi = db["nairobi"]

[17]: VimeoVideo("665412270", h="e4a5f5c84b", width=600)

[17]: <IPython.lib.display.VimeoVideo at 0x7fb542b32370>

Task 3.1.7: Use the count_documents method to see how many documents are in the nairobi
collection.
• What’s a MongoDB document?
• Count the documents in a collection using PyMongo.
[18]: nairobi.count_documents({})

[18]: 202212

[19]: VimeoVideo("665412279", h="c2315f3be1", width=600)

[19]: <IPython.lib.display.VimeoVideo at 0x7fb542b326d0>

Task 3.1.8: Use the find_one method to retrieve one document from the nairobi collection, and
assign it to the variable name result.
• What’s metadata?

3
• What’s semi-structured data?
• Retrieve a document from a collection using PyMongo.
[20]: result = nairobi.find_one({})
pp.pprint(result)

{ 'P1': 39.67,
'_id': ObjectId('6261a046e76424a61615daaf'),
'metadata': { 'lat': -1.3,
'lon': 36.785,
'measurement': 'P1',
'sensor_id': 57,
'sensor_type': 'SDS011',
'site': 29},
'timestamp': datetime.datetime(2018, 9, 1, 0, 0, 2, 472000)}

[21]: VimeoVideo("665412306", h="e1e913dfd1", width=600)

[21]: <IPython.lib.display.VimeoVideo at 0x7fb542b320a0>

Task 3.1.9: Use the distinct method to determine how many sensor sites are included in the
nairobi collection.
• Get a list of distinct values for a key among all documents using PyMongo.
[22]: nairobi.distinct("metadata.site")

[22]: [29, 6]

[23]: VimeoVideo("665412322", h="4776c6d548", width=600)

[23]: <IPython.lib.display.VimeoVideo at 0x7fb542b32eb0>

Task 3.1.10: Use the count_documents method to determine how many readings there are for
each site in the nairobi collection.
• Count the documents in a collection using PyMongo.
[24]: print("Documents from site 6:", nairobi.count_documents({"metadata.site":6}))
print("Documents from site 29:", nairobi.count_documents({"metadata.site":29}))

Documents from site 6: 70360

Documents from site 29: 131852

[25]: VimeoVideo("665412344", h="d2354584cd", width=600)

[25]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d6d0>

Task 3.1.11: Use the aggregate method to determine how many readings there are for each site
in the nairobi collection.

4
• Perform aggregation calculations on documents using PyMongo.
[26]: result = nairobi.aggregate(
[
{"$group":{"_id":"$metadata.site","count":{"$count": {}}}}
]
)
pp.pprint(list(result))

[{'_id': 29, 'count': 131852}, {'_id': 6, 'count': 70360}]

[27]: VimeoVideo("665412372", h="565122c9cc", width=600)

[27]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d7c0>

Task 3.1.12: Use the distinct method to determine how many types of measurements have been
taken in the nairobi collection.
• Get a list of distinct values for a key among all documents using PyMongo.
[28]: nairobi.distinct("metadata.measurement")

[28]: ['P2', 'humidity', 'temperature', 'P1']

[29]: VimeoVideo("665412380", h="f7f7a39bb3", width=600)

[29]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d610>

Task 3.1.13: Use the find method to retrieve the PM 2.5 readings from all sites. Be sure to limit
your results to 3 records only.
• Query a collection using PyMongo.
[30]: result = nairobi.find({"metadata.measurement":"P2"}).limit(4)
pp.pprint(list(result))

[ { 'P2': 34.43,
'_id': ObjectId('6261a046e76424a616165b3a'),
'metadata': { 'lat': -1.3,
'lon': 36.785,
'measurement': 'P2',
'sensor_id': 57,
'sensor_type': 'SDS011',
'site': 29},
'timestamp': datetime.datetime(2018, 9, 1, 0, 0, 2, 472000)},
{ 'P2': 30.53,
'_id': ObjectId('6261a046e76424a616165b3b'),
'metadata': { 'lat': -1.3,
'lon': 36.785,
'measurement': 'P2',

5
'sensor_id': 57,
'sensor_type': 'SDS011',
'site': 29},
'timestamp': datetime.datetime(2018, 9, 1, 0, 5, 3, 941000)},
{ 'P2': 22.8,
'_id': ObjectId('6261a046e76424a616165b3c'),
'metadata': { 'lat': -1.3,
'lon': 36.785,
'measurement': 'P2',
'sensor_id': 57,
'sensor_type': 'SDS011',
'site': 29},
'timestamp': datetime.datetime(2018, 9, 1, 0, 10, 4, 374000)},
{ 'P2': 13.3,
'_id': ObjectId('6261a046e76424a616165b3d'),
'metadata': { 'lat': -1.3,
'lon': 36.785,
'measurement': 'P2',
'sensor_id': 57,
'sensor_type': 'SDS011',
'site': 29},
'timestamp': datetime.datetime(2018, 9, 1, 0, 15, 4, 245000)}]

[31]: VimeoVideo("665412389", h="8976ea3090", width=600)

[31]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d2b0>

Task 3.1.14: Use the aggregate method to calculate how many readings there are for each type
("humidity", "temperature", "P2", and "P1") in site 6.
• Perform aggregation calculations on documents using PyMongo.
[32]: result = nairobi.aggregate(
[
{"$match": {"metadata.site":6}},
{"$group":{"_id":"$metadata.measurement","count":{"$count": {}}}}
]
)
pp.pprint(list(result))

[ {'_id': 'P2', 'count': 18169},

{'_id': 'humidity', 'count': 17011},
{'_id': 'temperature', 'count': 17011},
{'_id': 'P1', 'count': 18169}]

[33]: VimeoVideo("665412418", h="0c4b125254", width=600)

[33]: <IPython.lib.display.VimeoVideo at 0x7fb542b2f820>

6
Task 3.1.15: Use the aggregate method to calculate how many readings there are for each type
("humidity", "temperature", "P2", and "P1") in site 29.
• Perform aggregation calculations on documents using PyMongo.
[34]: result = nairobi.aggregate(
[
{"$match": {"metadata.site":29}},
{"$group":{"_id":"$metadata.measurement","count":{"$count": {}}}}
]
)
pp.pprint(list(result))

[ {'_id': 'P2', 'count': 32907},

{'_id': 'humidity', 'count': 33019},
{'_id': 'temperature', 'count': 33019},
{'_id': 'P1', 'count': 32907}]

1.3 Import
[35]: VimeoVideo("665412437", h="7a436c7e7e", width=600)

[35]: <IPython.lib.display.VimeoVideo at 0x7fb54437f9a0>

Task 3.1.16: Use the find method to retrieve the PM 2.5 readings from site 29. Be sure to limit
your results to 3 records only. Since we won’t need the metadata for our model, use the projection
argument to limit the results to the "P2" and "timestamp" keys only.
• Query a collection using PyMongo.
[42]: result = nairobi.find(
{"metadata.site":29, "metadata.measurement":"P2"},
projection={"P2":1, "timestamp":1, "_id":0}
)
#pp.pprint(result.next())

[39]: VimeoVideo("665412442", h="494636d1ea", width=600)

[39]: <IPython.lib.display.VimeoVideo at 0x7fb542b3db80>

Task 3.1.17: Read records from your result into the DataFrame df. Be sure to set the index to
"timestamp".
• Create a DataFrame from a dictionary using pandas.
[43]: df = pd.DataFrame(result).set_index("timestamp")
df.head()

[43]: P2
timestamp

7
2018-09-01 00:00:02.472 34.43
2018-09-01 00:05:03.941 30.53
2018-09-01 00:10:04.374 22.80
2018-09-01 00:15:04.245 13.30
2018-09-01 00:20:04.869 16.57

[44]: # Check your work

assert df.shape[1] == 1, f"`df` should have only one column, not {df.shape[1]}."
assert df.columns == [
"P2"
], f"The single column in `df` should be `'P2'`, not {df.columns[0]}."
assert isinstance(df.index, pd.DatetimeIndex), "`df` should have a␣
,→`DatetimeIndex`."

Assignment
No ratings yet
Assignment
5 pages
Theory Questions: Finalexam. Write Down The Command That You Have Used Inside The Terminal To
No ratings yet
Theory Questions: Finalexam. Write Down The Command That You Have Used Inside The Terminal To
1 page
MongoDB CheatSheet
No ratings yet
MongoDB CheatSheet
1 page
Mongodb Homework 3.1 Python
100% (1)
Mongodb Homework 3.1 Python
6 pages
Homework 3.4 Mongodb
100% (1)
Homework 3.4 Mongodb
5 pages
Mongodb Homework 4.2 Answer
100% (1)
Mongodb Homework 4.2 Answer
4 pages
Homework 3.1 Mongodb Answer
100% (1)
Homework 3.1 Mongodb Answer
6 pages
Tutorial - PyMongo 2.7.1 Documentation
No ratings yet
Tutorial - PyMongo 2.7.1 Documentation
7 pages
NGDM Question Bank Module 3
No ratings yet
NGDM Question Bank Module 3
1 page
MongoDB With Python
No ratings yet
MongoDB With Python
4 pages
Mongodb Homework 5.4
100% (1)
Mongodb Homework 5.4
7 pages
FDS Mid-2 Question Bank
No ratings yet
FDS Mid-2 Question Bank
2 pages
Mongo DB Using Python
No ratings yet
Mongo DB Using Python
7 pages
Chapter 3. MongoDB
No ratings yet
Chapter 3. MongoDB
18 pages
RavPower RP WD02 Filehub Windows Manual
No ratings yet
RavPower RP WD02 Filehub Windows Manual
13 pages
MongoDB Schema Design
No ratings yet
MongoDB Schema Design
69 pages
Lab#11 Mongodb Basic CRUD Commands
No ratings yet
Lab#11 Mongodb Basic CRUD Commands
9 pages
MongoDB CRUD Operations
No ratings yet
MongoDB CRUD Operations
70 pages
CS2403 Computer Networks Question Bank
No ratings yet
CS2403 Computer Networks Question Bank
16 pages
Mongo DB - Sub - 241114 - 092501
No ratings yet
Mongo DB - Sub - 241114 - 092501
6 pages
Python + MongoDB
No ratings yet
Python + MongoDB
12 pages
TP2 BD MongoDB Python
No ratings yet
TP2 BD MongoDB Python
4 pages
M10A1
No ratings yet
M10A1
3 pages
Crud
No ratings yet
Crud
7 pages
Sites Sites Sites - Google Google Google Com Com Com/site Site Site Site/javaprogramsisc Javaprogramsisc Javaprogramsisc Javaprogramsisc
No ratings yet
Sites Sites Sites - Google Google Google Com Com Com/site Site Site Site/javaprogramsisc Javaprogramsisc Javaprogramsisc Javaprogramsisc
2 pages
Unit 5 Lab Programs Ex - No.5.1 To 5.3
No ratings yet
Unit 5 Lab Programs Ex - No.5.1 To 5.3
8 pages
B2MML V0600 Common PDF
No ratings yet
B2MML V0600 Common PDF
60 pages
Big Data Practical 3
No ratings yet
Big Data Practical 3
4 pages
Nosql Lab Mongodb
No ratings yet
Nosql Lab Mongodb
3 pages
DS Retest
No ratings yet
DS Retest
18 pages
Practical # 2
No ratings yet
Practical # 2
7 pages
Code 201
No ratings yet
Code 201
2 pages
2384 - 1020 - DOC - Python & MongoDB A Beginner's Guide
0% (1)
2384 - 1020 - DOC - Python & MongoDB A Beginner's Guide
5 pages
W7 - MongoDB in Python (Me)
No ratings yet
W7 - MongoDB in Python (Me)
37 pages
Assignment 16 Utkarsh
No ratings yet
Assignment 16 Utkarsh
8 pages
ANSI SQL - Commonly Made Mistakes Ebox
No ratings yet
ANSI SQL - Commonly Made Mistakes Ebox
5 pages
UE20MC505B Unit3 LectureNotes
No ratings yet
UE20MC505B Unit3 LectureNotes
14 pages
Tech Arkit Topic List: Interview Questions and Answers
No ratings yet
Tech Arkit Topic List: Interview Questions and Answers
8 pages
NGT Practical
No ratings yet
NGT Practical
18 pages
BDTT Lab 2023 24 Week9
No ratings yet
BDTT Lab 2023 24 Week9
26 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Decentralized Election Voting System Using Blockchain
100% (1)
Decentralized Election Voting System Using Blockchain
1 page
Q1. Question To Define A Class and Its Member Function
No ratings yet
Q1. Question To Define A Class and Its Member Function
2 pages
Big Data Hadoop Training Certification 7
No ratings yet
Big Data Hadoop Training Certification 7
40 pages
Data Communications and Networking: Signal Encoding Techniques
No ratings yet
Data Communications and Networking: Signal Encoding Techniques
57 pages
Dod Unit4
No ratings yet
Dod Unit4
18 pages
InternalAssignment2
No ratings yet
InternalAssignment2
11 pages
CS 313 Introduction To Computer Networking & Telecommunication Data Link Layer Part II - Sliding Window Protocols
No ratings yet
CS 313 Introduction To Computer Networking & Telecommunication Data Link Layer Part II - Sliding Window Protocols
39 pages
Crowdstrike Falcon Event Streams Add-On: Installation and Configuration Guide
No ratings yet
Crowdstrike Falcon Event Streams Add-On: Installation and Configuration Guide
33 pages
Mongo DB
No ratings yet
Mongo DB
6 pages
MongoDB Class Exercise - 1
No ratings yet
MongoDB Class Exercise - 1
8 pages
Dod Unit3
No ratings yet
Dod Unit3
21 pages
Python MongoDB
No ratings yet
Python MongoDB
36 pages
CADDY XML Specification V 3 0 5
No ratings yet
CADDY XML Specification V 3 0 5
70 pages
Mongodb
No ratings yet
Mongodb
24 pages
Course Contents: Oracle 11g Structure Query Language Fundamental-L (SQL)
No ratings yet
Course Contents: Oracle 11g Structure Query Language Fundamental-L (SQL)
15 pages
UjwalBhattarai WPS4
No ratings yet
UjwalBhattarai WPS4
36 pages
Bdalabmanual
No ratings yet
Bdalabmanual
11 pages
SLIP's fsemMCA
No ratings yet
SLIP's fsemMCA
19 pages
JDBC
No ratings yet
JDBC
13 pages
Question Bank
No ratings yet
Question Bank
2 pages
WK - 2 - 2 - MongoDB CRUD
No ratings yet
WK - 2 - 2 - MongoDB CRUD
6 pages
MongoDB Ex
No ratings yet
MongoDB Ex
3 pages
Objectives:: DB - Createcollection ("Products")
No ratings yet
Objectives:: DB - Createcollection ("Products")
4 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
No ratings yet
Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
5 pages
OTL Diag Prod
No ratings yet
OTL Diag Prod
128 pages
Archivetempad0900499362064200002035156 XML
No ratings yet
Archivetempad0900499362064200002035156 XML
18 pages
Daily Activities: Task Transaction Procedure
No ratings yet
Daily Activities: Task Transaction Procedure
8 pages
Pym On Go Presentation
No ratings yet
Pym On Go Presentation
13 pages
MongoDb Imp
No ratings yet
MongoDb Imp
21 pages
Practical 7
No ratings yet
Practical 7
7 pages
Ch2 Nosql Wordpress
No ratings yet
Ch2 Nosql Wordpress
9 pages
CS8493 Unit 3
No ratings yet
CS8493 Unit 3
128 pages
Lab 10 Mongo - DB Installtion Aand Config
No ratings yet
Lab 10 Mongo - DB Installtion Aand Config
22 pages
Acer Aspire 5538 Series Service Guide
No ratings yet
Acer Aspire 5538 Series Service Guide
227 pages
Dbms Assignment 9
No ratings yet
Dbms Assignment 9
6 pages
Designing MPLS in Next Generation Data Center: A Case Study
No ratings yet
Designing MPLS in Next Generation Data Center: A Case Study
112 pages
Workbook - Design & Process
No ratings yet
Workbook - Design & Process
29 pages
FastTrak and FastCard User Manual
No ratings yet
FastTrak and FastCard User Manual
67 pages
FSD Ca2
No ratings yet
FSD Ca2
16 pages
1.1.3 InputOutputStorage SLR3
No ratings yet
1.1.3 InputOutputStorage SLR3
23 pages
Big Data (Unit 3)
No ratings yet
Big Data (Unit 3)
46 pages
MongoDB - Cours 4
No ratings yet
MongoDB - Cours 4
58 pages
Python Mongodb Tutorial
100% (1)
Python Mongodb Tutorial
37 pages
SQL Crud
No ratings yet
SQL Crud
11 pages
2022-11-13 - Black Mass Halloween 2022
No ratings yet
2022-11-13 - Black Mass Halloween 2022
103 pages
Programming Assignment 3 v03
No ratings yet
Programming Assignment 3 v03
7 pages
Unit 2 - Bda Notes
No ratings yet
Unit 2 - Bda Notes
37 pages

031 Data Wrangling With Mongodb

Uploaded by

031 Data Wrangling With Mongodb

Uploaded by

031-data-wrangling-with-mongodb

April 23, 2022

3.1. Wrangling Data with MongoDB ��

[3]: VimeoVideo("665412094", h="8334dfab2e", width=600)

[3]: <IPython.lib.display.VimeoVideo at 0x7fb54447b460>

[4]: VimeoVideo("665412135", h="dcff7ab83a", width=600)

[4]: <IPython.lib.display.VimeoVideo at 0x7fb54447b1c0>

Task 3.1.1: Instantiate a PrettyPrinter, and assign it to the variable pp.

[6]: <IPython.lib.display.VimeoVideo at 0x7fb54447ba30>

[8]: <IPython.lib.display.VimeoVideo at 0x7fb54437fd60>

[9]: from sys import getsizeof

[ {'empty': False, 'name': 'admin', 'sizeOnDisk': 40960},

[11]: VimeoVideo("665412216", h="7d4027dc33", width=600)

[11]: <IPython.lib.display.VimeoVideo at 0x7fb542b112b0>

Task 3.1.4: Assign the "air-quality" database to the variable db.

[13]: VimeoVideo("665412231", h="89c546b00f", width=600)

[15]: VimeoVideo("665412252", h="bff2abbdc0", width=600)

[15]: <IPython.lib.display.VimeoVideo at 0x7fb542b11a30>

[17]: VimeoVideo("665412270", h="e4a5f5c84b", width=600)

[17]: <IPython.lib.display.VimeoVideo at 0x7fb542b32370>

[19]: VimeoVideo("665412279", h="c2315f3be1", width=600)

[19]: <IPython.lib.display.VimeoVideo at 0x7fb542b326d0>

[21]: VimeoVideo("665412306", h="e1e913dfd1", width=600)

[21]: <IPython.lib.display.VimeoVideo at 0x7fb542b320a0>

[23]: VimeoVideo("665412322", h="4776c6d548", width=600)

[23]: <IPython.lib.display.VimeoVideo at 0x7fb542b32eb0>

Documents from site 6: 70360

[25]: VimeoVideo("665412344", h="d2354584cd", width=600)

[25]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d6d0>

[{'_id': 29, 'count': 131852}, {'_id': 6, 'count': 70360}]

[27]: VimeoVideo("665412372", h="565122c9cc", width=600)

[27]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d7c0>

[28]: ['P2', 'humidity', 'temperature', 'P1']

[29]: VimeoVideo("665412380", h="f7f7a39bb3", width=600)

[29]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d610>

[31]: VimeoVideo("665412389", h="8976ea3090", width=600)

[31]: <IPython.lib.display.VimeoVideo at 0x7fb542b3d2b0>

[ {'_id': 'P2', 'count': 18169},

[33]: VimeoVideo("665412418", h="0c4b125254", width=600)

[33]: <IPython.lib.display.VimeoVideo at 0x7fb542b2f820>

[ {'_id': 'P2', 'count': 32907},

[35]: <IPython.lib.display.VimeoVideo at 0x7fb54437f9a0>

[39]: VimeoVideo("665412442", h="494636d1ea", width=600)

[39]: <IPython.lib.display.VimeoVideo at 0x7fb542b3db80>

[44]: # Check your work

You might also like