sparks fly away than butterfly can fly itself.

Gue enggak ngerti apa istilah cinta sebelumnya. gue cuma ngerti perihal, mencintai adalah sebuah kewajiban yang manusia harus lakukan. Monolog gue akan selalu begitu, sampai pada akhirnya gue…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

An intro to database models

Databases are a logical way to store a collection of information. They can be anything from a collection of user information to a personal diary. Data can be stored in different structures called models. Data scientists need to transform their data to fit the model that stores their data the best. Models are chosen based on how easy they are to use when interacting with the data, but just how many different database models are there? In data science, there are three main types of models to choose from; the relational model, the document model, and the graph model.

Relational Model

In this model, data is stored in tables, with regulated structures and data types. Each table has a column that contains a unique primary key that connects it to other tables in the database. When a primary key in one table is used in another, it is called a foreign key, and it references information from a different table. One of the most popular ways to create relational models is using structured querying language better known as SQL.

An Introduction to SQL

SQL is used to communicate with a relational database, where each feature has a certain data type. Choosing the correct data type will minimize memory constraints, and allow for more flexible inputs. To begin using SQL, you must first have a data frame. You can either read in a CSV file or create your own data frame.

Begin by opening Python, one way to interact with Python is through a jupyter notebook, but there are many different ways to communicate with SQL using Python. If you have never downloaded pymysql or sqalchemy, you should install them by uncommenting the install statements below. Make sure to restart your kernel after installing it before running the import statements.

Next, you want to load your data and put it into the SQL database you created. In this example, we will be using the MySQL workbench. Inside the MySQL workbench, you can create a new database by clicking the icon that looks like a stack of coins to create a new schema. You can then apply the schema, and it should appear in your schemas. Next, load your data and set up the connection to your database.

Joins

One of the most important parts of working with multiple tables is joining them correctly. The type of join you use will vary depending on what you are trying to join, and what type of information you are storing in each table. If you have two tables A and B, different joins will impact how these two tables come together.

Outer joins: This will join all of the information from Table A and Table B.

Inner joins: This will join the information shared between tables A and B.

Left join: This will join everything in table A, and the information shared between table A and table B.

Right join: This will join everything in table B and the information shared between table A and table B.

You can also perform more complex aggregations, by using the WHERE keyword to specify which values you want to view. You can also use built-in mathematical functions such as SUM. Luckily, SQL recognizes basic math operations including ( +, -, /, *, % ). You can also perform mathematical operations on certain columns to make a unique column from your data.

Document Model

This model has a data item called a document, that is stored independently of other documents. This means each document is fully self-contained. Unlike tables, each document can have its own properties, since it does not have to follow a strict structure with rigid data types. One of the most popular document-based database programs is called Mongo. In this example, we will be showing you commands and queries in MongoDBCompass.

An Introduction to Mongo

Each record in Mongo is called a document, these documents are very similar to JSON objects or Python dictionaries. Unlike SQL, Mongo can have multiple layers to store information. Everything in Mongo is stored in key: value pairs, where each key correlates to a piece of information. To access the value, simply type dictionary[“key”]. In a simple case, each column title can correspond to a key while the information in the row corresponds to the values in each column. Each document in Mongo is independent, giving it the ability to embed other document-like objects within one document. This eliminates the need to use any joins which can be computationally expensive. Instead of joining tables, you can embed more information under one key. For example, a key in one table corresponding to company information can now store a dictionary that has the information from that table. Embedded documents provide a more efficient way to store related data since they can now be accessed from one document.

To begin using Mongo, you can create a new .ipynb file (python notebook), and begin installing the packages necessary.

Inserting data into Mongo from a data frame is not challenging since Python allows us to convert data frames into dictionaries. We will now insert our data frame from the previous example into Mongo.

While querying with Mongo can seem challenging at first, it can be incredibly powerful. It is easy to think of using Mongo and its queries like accessing values from a JSON file or a very nested dictionary.

Graph Model

In this model, each data item is called a node, and it is directly linked to other nodes through named relationships called edges. In some cases nodes and edges can have properties, but this is case-specific to each graph model, and the data it is trying to store. One powerful graph database management system is called Neo4j.

An Introduction to Neo4j

Each node in a graph has a label that represents a type. For example, each node in our data would have the type PERSON since it represents one person. Each edge has a label, and each relationship has a direction. If a person attends a school, it can be represented by PERSON-[:ATTENDS]->SCHOOL. This makes it very easy to depict relationships between different nodes. Both nodes and relationships can have properties associated with them. Graph models are incredibly flexible, computationally efficient, and great at modeling relationships and networks.

When setting up Neo4j, we can continue to use a Python notebook. Begin by opening Neo4j and creating and naming a new project. Next, select your project and click Add localDBMS. This will take a few minutes to set up. After specifying your password, click Create and wait for your project to load. Once loaded, hover over your graph, and hit start. Note, you can only start one graph at a time, so make sure you stop this graph if you plan to work on another project. Once loaded, you should get a notification on your desktop that it is ready to begin.

Next, we will create some pseudo data to create nodes and relationships that are easy to navigate.

Now we will start entering this data into Neo4j. We will be creating nodes that represent people, with relationships to schools and companies. When creating cypher commands, it is advised to reference the cypher browser. Your cypher command will then go in the graph.run statement.

For more challenging cyphers you can use the built-in movies data provided by Neo4j. To view your graph and type cyphers directly into Neo4j, hover over your graph name and click open browser. Making queries will be much easier since we can now do them inside Neo4j.

One powerful query is the shortest path query. Trying this in Neo4j’s movies graph you can simply enter the command,

match p=shortestPath((bacon:Person{name:”Kevin Bacon”})-[*]-(meg:Person{name:”Meg Ryan”})) return p

which returns:

The cyphers in Neo4j can be incredibly powerful, and the more you practice, the more natural the clunky syntax will become.

Choosing the right model

Although there is no simple way to choose models, as our examples show, there are different pros and cons to using each type of model. While relational models are great for storing data across multiple tables, joining these tables is computationally expensive. Document models like Mongo can help bypass this problem, but it is not always possible to efficiently store your data in the JSON-like objects that are required. Graph models propose a fun and powerful alternative by displaying data as nodes with relationships, however, they can become difficult to navigate as you store more data with more complex relationships.

In the end, it is sometimes best to try each model and use your own intuition to choose which model is best for displaying your data. You want a model that stores information in a logical manner, yet also has an interface that makes querying your data easy.

While there is not a one size fits all approach to choosing models, as a data scientist, it is your job to determine which model is best.

Why do databases matter?

Databases are a powerful way to store large amounts of information. Choosing the wrong database can make querying nearly impossible or computationally costly. Choosing the right model to fit the data is integral to a data scientist’s job. For this reason, it is important for all aspiring data scientists to become familiar with each model. Having these skills will not only make you better at your job, but it will also improve your problem-solving skills and intuition when it comes to properly storing data.

sparks fly away than butterfly can fly itself.

An intro to database models

Add a comment

Related posts:

Is An Overachieving Culture a Bad Thing?

Anatomy One Keto ACV Gummies Reviews

FIXING STUFF AROUND YOURSELF