BusyBear

From ESE205 Wiki
Revision as of 23:57, 16 April 2019 by Tom goon (talk | contribs) (→‎Links)
Jump to navigation Jump to search

Project Proposal

Overview

It always seems like an impossible task to find an open table to work or a quick line for food across the WashU campus. BusyBear's goal is to create a database that is accessible to WashU students that will show the population and busyness trends of popular locations on campus, beginning with Bear's Den. By using a network adapter connected to the Raspberry Pi, we will receive an approximate measurement of busyness based on the number of found MAC addresses for a specific region. By looking at pictures taken simultaneously with the MAC address collection, a historic trend between the number of found MAC addresses and relative busyness can be determined. We hope to be able to store this information in a database hosted by AWS and display this data on a website. Our end goal is to gather information to allow the WashU community to create more educated decisions regarding where to go and when to go there.

Team Members

Thomas Emerson
Tom Goon
Allison Todd
David Tekien, TA
Jim Feher, Instructor

Links

[Project Log]
[Project Presentation]
[GitHub Repository]
[Network Adapter Monitoring Mode Tutorial]
[Final Poster]

Objectives

  • Learn and be able to code in Python as related to the Pi
  • Use sniffing/MAC tracking method in the analysis of busyness
  • Investigate the use of the camera in the analysis of busyness
  • Be able to monitor busyness fairly accurately by filtering detected devices
  • Compare busyness at different times of day and between buildings
  • Design a GUI for an aesthetically pleasing and useful website
  • Host a website displaying useful and relevant data through Amazon Web Services (AWS)

Challenges

  • Limited experience with working with WiFi receivers or anything to do with MAC Addresses
  • Limited knowledge of Python and Raspberry Pi
  • Connecting our data with a database, AWS, and a website
  • Privacy Concerns

Gantt Chart

GanttChart 1.png

Budget

Item Description Cost Link
AWS Website Hosting $5 / month https://aws.amazon.com/pricing/?nc2=h_ql_pr
2 x TL-WN722N Network Adapter returned: $7.21 https://www.amazon.com/TP-Link-TL-WN722N-Wireless-network-Adapter/dp/B002SZEOLG
1 x 5dBi Long Range WiFi for Raspberry Pi Network Adapter returned: $5.00 https://www.amazon.com/5dBi-Long-Range-WiFi-Raspberry/dp/B00YI0AIRS/ref=lp_9026119011_1_1?srs=9026119011&ie=UTF8&qid=1550447401&sr=8-1
1 x Alfa AWUSO36NH High Gain USB Wireless G/N Long-Range WiFi Network Adapter Network Adapter $31.99 https://www.amazon.com/Alfa-AWUSO36NH-Wireless-Long-Rang-Network/dp/B0035APGP6/ref=sr_1_1_sspa?keywords=alfa+network+adapter&qid=1553045771&s=gateway&sr=8-1-spons&psc=1
mybusybear.com Domain Name $12.00 DomainPrice.jpg
Total Cost $71.20

Design and Solutions

Basic Idea
http://www.mybusybear.com

Build the Device

We began by constructing a device to collect MAC addresses. Initially, we hoped that with the RaspberryPi's WiFi capabilities, we could simply use the base hardware for detection. We quickly determined that the RaspberryPi was not capable of entering a monitoring mode[1]; we would need external hardware to serve this purpose. We went through a variety of external network adapters, and ultimately found one with both monitoring mode capabilities and compatibility with the RaspberryPi [2]. Using the Network Adapter's functionality will be explored further in the Collect Information section and in the Network Adapter in Monitoring Mode tutorial[3].
We decided that a RaspberryPi camera should be added to the device to strengthen the validity of the data gathered from the network adapter. The Pi Camera is fairly simple to connect and the functionality is implemented through Pi commands[4]. By analyzing a combination of the number of addresses collected and the visual busyness found in the picture, more accurate trends over time can be determined.

Collect Information

In the before mentioned tutorial [Network Adapter Monitoring Mode Tutorial] we established how setup the network adapter in monitoring mode and install kismet, the software we used to utilize monitoring mode. Once properly configured, simply calling kismet spews out text into the console as so:

INFO: 802.11 Wi-Fi device 00:A7:42:FC:6E:03 advertising SSID
      'wustl-guest-2.0'
INFO: Detected new 802.11 Wi-Fi access point 00:A7:42:FC:6E:00
INFO: 802.11 Wi-Fi device 00:A7:42:FC:6E:00 advertising SSID 'eduroam'
INFO: Detected new 802.11 Wi-Fi device 74:B5:87:C6:90:1E
INFO: Detected new 802.11 Wi-Fi device 00:08:E3:FF:FD:EC

We could collect data, but we needed a way to be able to consolidate the MAC Addresses, find out what device it belonged to, and upload that information to the database all periodically throughout a gap of time. Over a couple of months, we finalized our data collection design to utilize crontab [5] , a program used the schedule execution of programs at certain times. Crontab was also utilized to setup emailing the Pi's IP address on boot, as detailed in a class tutorial. [6] Crontab was used to schedule two tasks: Running kismet and dumping the output into a text file, and running a script to parse through the text file and upload the necessary information.

The crontab usage can be seen below. The first program runs kismet through the timeout modifier [7] such that it only runs for four minutes. All the contents of the output is written to the the kismetlog.txt

# m h  dom mon dow   command
*/5 *  *   *   *    /usr/bin/timeout 240 /usr/local/bin/kismet > kismetlog.txt
*/5 *  *   *   *    ./busybear2

The second task to run is an executable file named "busybear2" [8] whose contents is shown below.

sleep 242s
python3 uploader.py

This bash script's only purpose is to wait 4 minutes and 2 seconds (essentially waiting for the kismet task to terminate) before executing the uploader script. The contents of "uploader.py" can be seen below.

# Regular Expression, only gets MAC Addresses after it sees "device"
MAC_regex = re.compile(r"(?<=\bdevice\b\s)\w\w[:]\w\w[:]\w\w[:]\w\w[:]\w\w[:]\w\w")

# Loop through the lines of the file to find MAC Addresse
for line in testFile:
  MAC_addresses = MAC_regex.findall(line) # Compile all found mac addresses in var MAC_addresses
  for address in MAC_addresses: # Loop through the individual MAC Addresses
    req = requests.get(MAC_URL%address)
    obj = req.json()
    for key, value in obj.items():
      if('company' in value):
        values = (address,value['company'])
      else:
      	values = (address,'Null')
      cursor.execute(qString,values)

The uploader script uses regular expressions [9] [10] [11] in order to isolate the desired MAC addresses from the text file constructed in the first crontab task. From there, it utilizes a MAC Address Vendor Lookup API [12] in order to attach a vendor name to a MAC Address. From there, the MAC Address and its associated vendor is uploaded to our mySQL database [13] whereupon it is automatically assigned a timestamp and unique ID. More on how our database was created and structure in the next section. NOTE: Not all code is shown, full code can be accessed on our GitHub.

Managing a Database

RDD database through AWS named BusyBear. Which we can access through MySQL workbench. From the database, we created multiple tables to story current MAC addresses and historical information. It was important for us that the table storing MAC addresses auto populate timestamps with the current time. As far as externally connecting the database to both the Pi and the website, we found that using a combination of python and PHP was the most effective. It is important to note, however, to keep the database secure we had to escape all incoming data to prevent SQL injection attacks. As of now, our database isn't in 3rd normal form however this is a goal we have to store and access data most efficiently. One hurdle was attempting to store pictures in the database (as LONGBLOB) but we discovered that because a) the complexity caused instability, and b) picture's large size made queries take an exceptional amount of time. It was best to just upload images directly to the website and bypass the database.

Databases and Structures Used

Contained within our uploader.py program (see GitHub for full code), we are uploading ALL MAC Addresses to a single table. From there, we sort this information into more tables. The basic idea can be seen by the strings used to access/alter the mySQL table below:

qString = 'INSERT INTO wifiMAC_BD (macAdd, vendor) VALUES (%s, %s)'

The first qString demonstrates what happens first; we upload every MAC Address, regardless of what it was, into a database named wifiMAC_BD. We can see an example of the entries below. Notice all the MAC Addresses, the timestamp they were inserted, and the vendors that refer to each address. Null simply means our API did not have a registered vendor under that MAC Address.


wifiMAC_BD table


qString2 = "SELECT count(*) FROM wifiMAC_BD where (timestampe > now() - interval '5' minute) AND ( vendor = 'Apple, Inc.' OR  vendor = 'Google, Inc.' Or  vendor ='Microsoft' Or  vendor ='Samsung Electronics Co.,Ltd' OR  vendor ='HUAWEI TECHNOLOGIES CO.,LTD');" 
qString3 = "INSERT INTO historicalData_BD_Limited (numAddresses,helper)  VALUES (%s,%s)"

Next, using qString2, we query the database with all the entries and only record the number of entries from the past 5 minutes (our code runs every 5 minutes) and that have a valid vendor. Valid vendors are those that probably corresponds to devices used by students, like Apple, Samsung, Microsoft, etc. This filters out any noise like routers, printers, and many more random devices at any given time which gives consistency to our design. The next qString inserts this number into a separate table called historicalData_BD_Limited as seen in qString3. This table is used to create our charts located on our website. An example of what this table's entries looks like can be seen below. The important piece is that there is an entry representing the total number of filtered devices at a given 5 minute interval.


historicalData_BD_Limited table


Create a Website

Website Design

We initially imagined that our website would flow like this:

Website Design Flowchart


At the moment our plan is to create a website very similar to this, but with direct access to the available location from the home page. Our current website should function in the following:

Website Design Flowchart



The Website is hosted by AWS and uses a LightSail instance running LAMPS (which comes preinstalled with apache2, PHP, and MySQL support) which made the setup significantly easier. Each page of the website is a PHP, CSS, HTML, and Javascript hybrid which uses PHP to access data from and to the database while HTML, CSS, and Javascript are used to display this information. Specifically, we use Javascript functions created by Google as well as code from https://canvasjs.com/html5-javascript-bar-chart/ to display the relevant graphs. To make the website more professional, we opted to buy (rent) a domain name for Amazon Route 53. This required us to rework the DNS preferences of the domain but ultimately was successful in connection to our Lightsail instance. Our final website can be found here: http://www.mybusybear.com

Creating Charts

Our measure of busyness is completely relative. That is, it relies on us having a lot of reliable data. We do so by having many entries of the number of devices in a given location over a long period of time. From there, we can find the maximum and minimum number of devices in the history of the location. This gives us a relative "empty" and "full" which we can then compare the current busyness. Similar thinking can be applied to the historical information (hour periods over a given day) to construct relative charts. Doing so actually gives us a good estimate of how busy Bear's Den is at a given point of time. Full code can be seen in our BusyBear_BD.php file located on our GitHub.


To display live busyness, we went with a pie chart because it gives a really easy to see visualization of how free or busy the location is. A slice of a pie gives a very instantaneous and intuitive sense of how free the location is which gives useful insight on whether to go to the location or not. For historical busyness, we went with a bar graph over hour long times. This gives a very nice visual trend over time. You can see spikes or periods of high busyness that make logical sense.


Below we can see a brief snippet of the start of the code to get the maxes and mins so that you can locate it on our Git. Examples of the charts can also be seen in the Results tab.

$sql2 = "SELECT Max(numAddresses) AS max FROM historicalData_BD_Limited";
$sql3 = "SELECT Min(numAddresses) AS min FROM historicalData_BD_Limited";
$sql4 = "SELECT numAddresses AS current FROM historicalData_BD_Limited where timestampe > now() - interval '5' minute";

Results


Insert picture of live chart

Insert picture of historical chart

Next Steps

There are several areas where we can improve upon or explore.

  1. Increasing capacity in which the camera is used: Currently, the pictures captured by the Pi are strictly for reference only. It would be a more robust system if some sort of human detection was implemented and automatically analyzed to refine our busyness level.
  2. Privacy and security concerns: Obviously nothing strictly illegal was done, however, that does not mean there are no worries. While sniffing packets is done regularly by numerous devices, apps, etc. there is still a concern of proper ethical considerations into tracking and monitoring people. Additionally, pictures being taken and stored could potentially capture damaging information. All of this paired with the fact we did not encrypt any information is certainly a front in which we would strengthen.
  3. Expanding the range and operation. With only one Pi and its limitations in that it is a rather bulky physical object means that it is hard to find a consistent place to operate. Addressing this and expanding where we operate would certain broaden the scope of the project.

References

Not Quoted

Past Projects

Pi Blinking LED (tutorial sake)

nmap (unused in the end)

fping (unused in the end)

openCV (unused in the end)

kismet & monitoring mode (referenced in our tutorial)

Regex/Dictionary/API

Quoted

  1. Pi Network - [1]
  2. Network Adapter - [2]
  3. Network Adapter in Monitoring Mode Tutorial - [3]
  4. Pi Camera - [4]
  5. How to use Crontab - [5]
  6. SSHing into your Pi Tutorial - [6]
  7. Using timeout with crontab - [7]
  8. How to make a file executable - [8]
  9. CSE 330 Wiki: Regular Expressions - [9]
  10. Regular expressions look-ahead/behind - [10]
  11. Online Regular Expression tester - [11]
  12. MAC Address Vendor Lookup API - [12]
  13. Uploading data to a mySQL database - [13]