Kevin D. Smith

SAS Viya


Скачать книгу

      The easiest way to load data into a CAS server is by using the upload method on the CAS connection object. This method uses a file path or URL that points to a file in various possible formats including CSV, Excel, and SAS data sets. You can also pass a Pandas DataFrame object to the upload method in order to upload the data from that DataFrame to a CAS table. We use the classic Iris data set in the following data loading example.

      In [12]: out = conn.upload('https://raw.githubusercontent.com/' + ....: 'pydata/pandas/master/pandas/tests/' + ....: 'data/iris.csv')

      In [13]: out

       Out[13]:

      [caslib]

      'CASUSER(username)'

      [tableName]

      'IRIS'

      [casTable]

      CASTable('IRIS', caslib='CASUSER(username)')

      + Elapsed: 0.0629s, user: 0.037s, sys: 0.021s, mem: 48.4mb

      The output from the upload method is, again, a CASResults object. The output contains the name of the created table, the CASLib that the table was created in, and a CASTable object that can be used to interact with the table on the server. CASTable objects have all of the same CAS action set and action methods of the connection that created it. They also include many of the methods that are defined by Pandas DataFrames so that you can operate on them as if they were local DataFrames. However, until you explicitly fetch the data or call a method that returns data from the table (such as head or tail), all operations are simply combined on the client side (essentially creating a client-side view) until data is actually retrieved from the server.

      We can use actions such as tableinfo and columninfo to access general information about the table itself and its columns.

       # Store CASTable object in its own variable.

      In [14]: iris = out.casTable

       # Call the tableinfo action on the CASTable object.

      In [15]: iris.tableinfo()

       Out[15]:

      [TableInfo]

      Name Rows Columns Encoding CreateTimeFormatted \

      0 IRIS 150 5 utf-8 01Nov2016:16:38:59

      ModTimeFormatted JavaCharSet CreateTime ModTime \

      0 01Nov2016:16:38:59 UTF8 1.793638e+09 1.793638e+09

      Global Repeated View SourceName SourceCaslib Compressed \

      0 0 0 0 0

      Creator Modifier

      0 username

      + Elapsed: 0.000856s, mem: 0.104mb

       # Call the columninfo action on the CASTable.

      In [16]: iris.columninfo()

       Out[16]:

      [ColumnInfo]

      Column ID Type RawLength FormattedLength NFL NFD

      0 SepalLength 1 double 8 12 0 0

      1 SepalWidth 2 double 8 12 0 0

      2 PetalLength 3 double 8 12 0 0

      3 PetalWidth 4 double 8 12 0 0

      4 Name 5 varchar 15 15 0 0

      + Elapsed: 0.000727s, mem: 0.175mb

      Now that we have some data, let’s run some more interesting CAS actions on it.

      The simple action set that comes with CAS contains some basic analytic actions. You can use either the help action or the IPython ? operator to view the available actions.

      In [17]: conn.simple?

      Type: Simple

      String form: <swat.cas.actions.Simple object at 0x4582b10>

      File: swat/cas/actions.py

      Definition: conn.simple(self, *args, **kwargs)

       Docstring :

      Analytics

      Actions

      -------

      simple.correlation : Generates a matrix of Pearson product-moment

      correlation coefficients

      simple.crosstab : Performs one-way or two-way tabulations

      simple.distinct : Computes the distinct number of values of the

      variables in the variable list

      simple.freq : Generates a frequency distribution for one or

      more variables

      simple.groupby : Builds BY groups in terms of the variable value

      combinations given the variables in the variable

      list

      simple.mdsummary : Calculates multidimensional summaries of numeric

      variables

      simple.numrows : Shows the number of rows in a Cloud Analytic

      Services table

      simple.paracoord : Generates a parallel coordinates plot of the

      variables in the variable list

      simple.regression : Performs a linear regression up to 3rd-order

      polynomials

      simple.summary : Generates descriptive statistics of numeric

      variables such as the sample mean, sample

      variance, sample size, sum of squares, and so on

      simple.topk : Returns the top-K and bottom-K distinct values of

      each variable included in the variable list based

      on a user-specified ranking order

      Let’s run the summary action on our CAS table.

      In [18]: summ = iris.summary()

      In [19]: summ

       Out[19]:

      [Summary]

      Descriptive Statistics for IRIS

      Column Min Max N NMiss Mean Sum Std \

      0 SepalLength 4.3 7.9 150.0 0.0 5.843333 876.5 0.828066

      1 SepalWidth 2.0 4.4 150.0 0.0 3.054000 458.1 0.433594

      2 PetalLength 1.0 6.9 150.0 0.0 3.758667 563.8 1.764420

      3 PetalWidth 0.1 2.5 150.0 0.0 1.198667 179.8 0.763161

      StdErr Var USS CSS CV TValue \

      0 0.067611 0.685694 5223.85 102.168333 14.171126 86.425375

      1 0.035403 0.188004 1427.05 28.012600 14.197587 86.264297

      2 0.144064 3.113179 2583.00 463.863733 46.942721 26.090198

      3 0.062312 0.582414