Wandio - STARDUST


Wandio

Wandiocat

wandiocat is a tool that behaves in a similar way to the cat Unix command in that it will output contents of a file to standard out. The main difference with wandiocat is that it can be used to read files over HTTP or files that are in a Openstack Swift object store.

To output the contents of a file (then you can pipe the content into another command):

user@vm001:~$ wandiocat <filename>

or

user@vm001:~$ wandiocat <url>
  • Downloads the contents of that URL and outputs it to standard out.

Wandio knows how to read a file from Swift. Just specify a path to the file using a Swift URI:

user@vm001:~$ wandiocat swift://<container name>/<object name>

Example

user@vm001:~$ wandiocat swift://telescope-ucsdnt-pcap-live/datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
  • It will handle authenticating with Swift, downloading/streaming the contents of this file, and as wandio is downloading it (the extension is .gz which means that the pcap is compressed with gzip compression) the file from Swift it will do the steps of decompression so that you do not have to download the file, decompress it, and process it.
  • Because the pcap file is a binary format, it is not going to be human-readable.
  • This will return binary feedback in the terminal

The function hexdump allows you to take binary data and write it out in more human-readable outputs from binary.

user@vm001:~$ wandiocat swift://<container name>/<object name> | hd | head

Example

user@vm001:~$ wandiocat swift://telescope-ucsdnt-pcap-live/datasource=ucsd-nt/year=2020/month=07/day=27/hour=09/ucsd-nt.1595840400.pcap.gz | hd | head
00000000  d4 c3 b2 a1 02 00 04 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 01 00 01 00 00 00  90 54 70 5f 00 00 00 00  |.........Tp_....|
00000020  3c 00 00 00 3c 00 00 00  3c fd fe 19 d8 00 00 de  |<...<...<.......|
00000030  fb ba 06 c7 08 00 45 00  00 28 ff 9f 00 00 f2 06  |......E..(......|
00000040  91 18 2d 81 21 31 2c 6f  bc f6 a0 01 0d 64 42 99  |..-.!1,o.....dB.|
00000050  4e 0d 00 00 00 00 50 02  04 00 35 bf 00 00 00 00  |N.....P...5.....|
00000060  e9 75 10 0a 90 54 70 5f  01 00 00 00 3c 00 00 00  |.u...Tp_....<...|
00000070  3c 00 00 00 3c fd fe 19  d8 00 00 de fb ba 06 c7  |<...<...........|
00000080  08 00 45 00 00 2c 31 94  00 00 e7 06 07 58 df f7  |..E..,1......X..|
00000090  99 f4 2c 2d f4 c6 c5 fe  66 76 73 42 a3 24 00 00  |..,-....fvsB.$..|
  • Use the output of the wandiocat command and pipe it into hexdump
  • Useful to understand what the exact data in the pcap file.

Storing in tcpdump:

To pipe a pcap trace, for instance, from the Swift object store in tcpdump:

user@vm001:~$ wandiocat swift://telescope-ucsdnt-pcap-live/datasource=ucsd-nt/year=X/month=Y/day=Z/ucsd-nt.TIMESTAMP.pcap.gz | tcpdump -r - <other tcpdump flags> | less

Note that the traces contain a lot of packets. Use less as a sink for the output.

Man page for tcpdump


pywandio

Python bindings for libwandio.

pywandio is a high-level Python file IO.

To use pywandio to access swift objects:

import wandio
with wandio.open("swift://<container>/<object>") as fh:
  # use fh like a normal file

Methods

open

Open a file from the given file path.

input swift file path

return which can be used to read the file.

Syntax Ex.

wandio.open(<name of swift file>)

Example

wandio.open("swift://data-telescope-meta-rsdos-daily/year=2021/month=01/day=12/ucsd-nt.rsdos-daily-attacks.2021-01-12.ts=1610409600.csv.gz")

close

Closes the file that was opened.

input none

return none

Syntax Ex.

fh.close()

next

Returns the next item from the file handler.

input none

return the next item in the file.

Syntax Ex.

fh.next()

Example

fh.next()

1.179.217.50,1,42,1,151,14949,49,1610425426,1610425715,131293,TH,AS

read

Reads all the text in the file. input optional

return all the text in the file

Syntax Ex.

fh.read()

readline

Read the next line in the file.

input optional

return the next line in the file

Syntax Ex.

fh.readline()

Usage

# this script should work with both python2 and python3
import wandio

files = [
         'http://data.caida.org/datasets/as-relationships/README.txt',
         'http://example.caida.org/data/external/as-rank-ribs/19980101/19980101.as-rel.txt.bz2'
         ]
for filename in files:
    # the with statement automatically closes the file at the end
    # of the block
    try:
        with wandio.open(filename) as fh:
            line_count = 0
            word_count = 0
            for line in fh:
                word_count += len(line.rstrip().split())
                line_count +=1
        # print the number of lines and words in file
        print(filename)
        print(line_count, word_count)
    except IOError as err:
        print(filename)
        raise err
Published