Группа авторов

Social Network Analysis


Скачать книгу

be achieved with the help of Social Network Analysis (SNA).

      This section introduces the prominent syntax and syntax styles of python, as well as different library packages and its significance [11].

      2.5.1 Comparison of Python With Traditional Tools

      1 Python is free open source, whereas MS Excel is a paid package.

      2 Python is easy for a complex equation and a huge data set, and Excel is good only for a small data set.

      3 Since python is an open source, anyone can audit or replicate a work that is not possible in Excel.

      4 Finding errors and debugging it is a lot easier in Python than Excel.

      5 Excel is way simpler to use than Python, i.e., the user does not need any programming knowledge.

      6 Repetitive tasks can be easily done by automation, which is not possible in Excel.

      7 Python provides in-depth visualizations, whereas Excel has basic graphs [12].

      Python installation consumes a bit more time because it should be properly downloaded in the right environment with all the necessary packages [13]. The standard version of python can be installed from the following link [https://www.python.org/downloads/].

      Different versions of Python with respect to the type of OS (Windows, Mac, Linux) can be found under this link.

      Some important package for SNA is pandas, matplotlib, and NetworkX. All these packages can be installed via pip installation.

       – pip install pandas

       – pip install network

       – pip install matplotlib

      NetworkX is an important library used to analyze social network in Python [14]. The package is mainly created to analyze the functions of complex graph structure. It is a free package under BSD license.

      Figure 2.7 Python official documentation.

      2.6.1 Good Practices

      1 It is always advised to install virtual environments like Anaconda environment. Miniconda can be used instead of anaconda if the computer has less than 5 Gb ram [15]. You can download the standard version of Anaconda here [https://docs.anaconda.com/anaconda/install/].

      2 Choosing editors, such as VS code or pycharm or IntellIj or Jupyter Notebooks, and so on, comes along with the Anaconda environment.

      3 Proceed with open-source version at the beginning. Use Anaconda Navigator→ interactive Visual mode Or Prompt Terminal Mode:– Creating new environments in Anaconda: conda create— name myenv– Replace myenv with the environment name.– Activate Environment: conda activate myenv– Installing packages: conda install [packagename]

      The more useful resources and explanations on working with conda environment can be found in their official documentation.

      Figure 2.8 Anaconda navigator.

Snapshot of conda environment installation.

      Figure 2.9 Conda environment installation.

      Some interesting case studies based on SNA are Facebook friends’ group and terrorist activities [16]. The case study has been worked in python with Jupyter notebook. You can download and explore the data set to get more insight under the following link.

      Scan the QR code and follow the Github link to access the worksheets.

      Figure 2.10 QR code for workbooks and source codes.

      2.7.1 Facebook Case Study

      The first important steps in analyzing any kind of data set in python is importing libraries. The data to be analyzed can be scrapped directly from the respective site or it can be accessed from the API provided by the website [17]. Choosing the data mainly depends on the need, i.e., why do we need to analyze the data? What is the purpose? What kind of problem are we solving? [18]

      Step 1: Import libraries

      Each library has their built-in function, which makes Python easy to code.

Snapshot of code blocks for importing libraries.

      Step 2: Read data

      Pandas is used to retrieve the data and can be used to explore a huge data set conveniently.

Snapshot of code block for reading data.

      Figure 2.12 Code block for reading data.

      Step 3: Data cleaning

      Data cleaning means removing/cleaning the noise (NaN, Missing data) [19]. Data quality will have more impact in the model so using the data with less noise is recommended for better results. Missing values can be altered by generating the mean, median value and so on [20–22]. It completely depends upon the type of data.

      Step 4: Read input

      read_edgelist is a built-in function in NetworkX library. More details about it can be found in the documentation website. [23]

Snapshot of code block for reading edge list.

      Step 5: Visualizing the network

Snapshot of visualization of Facebook users.

      Figure 2.14 Visualization of Facebook users.

Snapshot of code block for centrality measures.

      Figure