|
Main menu > Deep Data Diver > Big Basket V1.0 |
||||
| St.Petersburg Institute for Informatics RAS | ||||
Data Mining
Deep Data Diver
|
1. PurposeBig Basket system is designed for analysis of the market basket. It applies a new technology for searching associative rules that is based on a modified apparatus of linear algebra using the data self-organization procedure and the effect of information structure resonance. The system's unique properties give a possibility of finding high-accuracy associations of initial set of transactions elements with the given element in the data. These sets form a basket with high-level support and long item sets.
2. Statement of Market Basket Analysis TaskMarket basket is a set of commodities (services) purchased by the Customer within one separate transaction. These are, for instance, the results of the Customer's visiting supermarket, grocery, an interactive purchase in a virtual store like Amazon.com, etc. Registering business operations within the whole period of their activities, different companies offering commodities or services accumulate large collections of such transactions (databases). One of the most common tasks for statistic analysis of such databases is to find commodities and itemsets that are concurrently encountered in many transactions. Customer behavior patterns revealed as a result of this analysis are generally characterized by a list of commodities included in the set and the amount of transactions containing these sets. Trade companies use these patterns in order to allocate commodities in stores in a more correct way, to change the structure of pages in commodity catalogues and web pages, to form packages of services encountered together and so on. A set consisting of i-commodities is called i-itemset. The percentage of transactions having this set is called "support" of the set. It is considered that for this set to be of a certain interest its support should be higher than the minimum established by the user; such sets are called frequent. For an
itemset a "confidence"
characteristic is often used; it is connected with the set revelation accuracy
using one or another algorithm. The accuracy is often determined with regards
to one of the set items. It equals to a probability of some i-element
joining the set with the obligatory inclusion of i -
1 elements into the set. The higher the chosen set "confidence", the more
significance has the concerned set for the real practice. Moreover, the length
of i-set is an important characteristic. 3. General Information of Big Basket Operation3.1. Initial Data FormatInitial data should be represented in one of the two types that Big Basket system automatically identifies: 3.1.1. List
of transactions in which the commodities included in a transaction are
separated from each other by some separator. Data fragment is given below:
3.1.2. The
flat table where column headings - names of commodities correspond to transactions.
Values in the table cells take on two values - "yes" or "no" depending on the
case whether the given commodity is included into a certain transaction (Fig.
1). 1. Example of Initial Data Presentation 4 main
parameters are set in the Big Basket system after reading of the initial data: 1. Ai commodity, with which
associations are to be found. 2. The
transaction number (line in the Data Table), for which the most complete
association with the given accuracy is searched. 3. Desirable
level of the association error (accuracy). 4. Minimum
level of the transaction support with the given item. On the first stage of the system operation the user selects a desirable commodity Ai (as a rule, most frequently purchased) and sets a planned error level for the associative rule. Then the system automatically finds the first association with Ai product, for which Confidence and Support are calculated. During the next stage the system selects the most saturated with Ai commodity purchases transaction that has not been covered earlier by the first association, and finds the second association with Ai product for it. The two obtained associations together cover larger number of transactions than they could do it separately. Further, the procedure continues in the similar way for transactions that have not been covered earlier till all associations with Ai product satisfying the given parameters are found. 4. The System Operation4.1. General view of the system is represented on Fig. 2.2. Starting with the System Operation As an
example of the system operation let's take the data represented in the examples
of the well-known software product CBA
(http://www.comp.nus.edu.sg/~dm2) -supmart.tra
file (commercial version of
CBA costs 2,000 US
dollars). Start
Project creation wizard by clicking left mouse. New dialog box appears for
selection of data source (Fig. 3). 3. Dialog Box for Selection of Data
Source Select, for
example, ODBC and press ОК. New dialog box appears for
selection of ODBC driver. Select Microsoft Excel driver (as shown on
Fig. 4). 4. Selection of Excel Driver Press Connect button. New dialog box comes out, in which in Russian version operational system press Выбор Книги (Select Workbook) button
(Fig. 5). 5. ODBC *.xls Driver
Installation Then in Выбор Книги (Select Workbook) dialog box
(Fig. 5) select *.xls file subject to the
analysis (in our case initial file supmart.tra was
converted to supmart.xls). 6. Выбор книги *.xls Then press OK again in the dialog box on Fig. 7. Dialog Box for Creation of SQL query to the
selected Excel book Here, in Table field select
Data name, which we gave to the Data
Table of transactions in Excel. At the bottom, in SQL query field a formal record of our query in SQL language
appears at once. Press ОК - the system performs data reading. Click Options button- a dialog box comes out
for correction of the system settings (Fig. 8). 8. Dialog Box for System Settings Adjustment On the
first bookmark of the dialog box set planned errors for associations, the
system will search for. Besides, here it
is possible to change a parameter of self-organizing for the association
searching procedure -
recommended values are from 0.3 to 0.5. On the
second bookmark "Program settings" an optional "Show startup window" flag can
be removed. The third
bookmark "Associations" (Fig. 9) is used for setting up threshold values
for "Confidence" and "Support" levels for associations that are used for final
selection of associations to the required basket. 9. Setting Up Parameters for Selection of
Associations By pressing
"Analyze data" button on instrument
panel, start Wizard of associations search in the initial data. A dialog box
for item selection appears on the screen, with which associations will be
searched for in all transactions (Fig. 10). All items available in Data
Table are enumerated in the left column, in the right column - their absolute
frequencies. Select "CD" product, the most frequently occurring in
transactions. Press OK button. The
table of found associations with "CD" item comes out on the screen
(Fig. 11). 10. Selection of Item to Search Associations with 11. Associations with "CD" Item As it is
shown in the table, 3 associations were found that have one hundred percent "Confidence", and they individually cover approximately from
29 to 37 % of all transactions containing "CD" product. Big Basket system provides a possibility of detailed
viewing of transactions covered by one or another found association or group of
associations. For that it is required to select a desirable association (group
of associations) in the table and press "View details" button. For instance,
select the first association found and perform the
operation described above. "Associations details" window appears on the screen.
In this window select "Data matrix" bookmark , where
all transactions covered by the first association are marked with dark color (Fig. 12). 12. Transactions Covered by the First Association
"sugar and soya sauce => cd" If we
select all three associations, "Data Matrix" bookmark will be, as follows
(Fig. 13): 13. Transactions Covered by All Three Associations An important
characteristic of Big Basket system
is the function of graphical display of the market basket as the set of high-accuracy
associations. To illustrate this function press View Basket button (graphical
display) on the instrument panel. Consumer's
basket window comes out on the
screen (Fig. 14). In the left
field commodities, with which associations were searched for, are displayed and
the associations actually found. A diagram
in camomile form
is shown in the middle of the window; "CD" commodity is in the center of this
diagram. Basket support level is indicated at the left top of the "camomile"
field. The first number indicates a support level with respect to the quantity
of all transactions. The second number (in brackets) shows this level with respect
to the amount of those transactions that include the central commodity "CD". To the right of the diagram there is a column with the names of commodities included into the constructed basket (these commodities are assigned indexes С00, С01, : С11 indexes for convenient graphical display). 14. Graphical Display of the Commodities Basket
Found Below the
"camomile" is a bar chart showing how many times one or another commodity
joined the associations found (percentage). According
to this percentage outer circles of the "camomile" are painted in certain
colors. Color spectrum interpretation is shown to the left of the "camomile". In the
whole, as we can see, the basket found covers all 47% of transactions with "CD"
commodity. This basket was made up by three associations with 100% accuracy.
Thereby, a new customer while purchasing "CD" commodity will buy with 100%
accuracy an itemset from one of the associations that complied this basket. If we click on any association in the left field (in our case one of
three), the commodities included into the selected association will be marked
in bold type during graphical displaying of the basket. During
selection of the commodity, for which associations will be searched, there is a
possibility to carry out such searching for all commodities. In order to do that set Analyze
all items switch in Select Item dialog
box (Fig. 10). Then the following Table of associations (Fig. 15) and
correspondent graphical information (Fig. 16) will be displayed for the
concerned example. 15. All Discovered Associations 16. All Consumer Baskets Selecting the required commodity in the left field of the basket graphical display, the user has a freedom to select the most appropriate basket for him on the ground of one or another reason. 5. Demo versionMaximum 500 transactions and 50 products can be analyzed. Also Save Project and Load Project functions are disabled. 6. System RequirementsMinimum
system configuration requirements for Big
Basket operation:
Microsoft
Windows 95 or later version;
Pentium
processor - 100 MHz and over; 32 Mb RAM. |
Created by MaxMaster, 2003-2004 |