HI All In .install file, in hook_schema unction, we are mentioning index field and primary key field.
I am wondering, how to choose which one should be index and which one should be primary key. Does that affect the performance (in search, add, delete of records).
What I am thinking at this point of time, my database will contain job seekers name, mail-id, contact number, skill set (as comma separated values) and resume.
For index purpose, I am thinking I will use '$user->uid' as Index and and 'name' as primary key.
I am bit confused how it it is going to affect search/add/delete time (assuming huge number of records in database) and also database management.
Please guide me.
Regards Austin
Le jeudi 17 mars 2011 à 16:56 +0530, Austin Einter a écrit :
HI All In .install file, in hook_schema unction, we are mentioning index field and primary key field.
I am wondering, how to choose which one should be index and which one should be primary key. Does that affect the performance (in search, add, delete of records).
What I am thinking at this point of time, my database will contain job seekers name, mail-id, contact number, skill set (as comma separated values) and resume.
For index purpose, I am thinking I will use '$user->uid' as Index and and 'name' as primary key.
I am bit confused how it it is going to affect search/add/delete time (assuming huge number of records in database) and also database management.
Please guide me.
Regards Austin
Primary key semantically means "Unique not null and indexed".
So, primary key is an indexed, not null, and unique field. But, primary key can be composed by more than one field (therefore the unique constraint goes to the fields tuple).
Integer performances, for keys and indexes, are always way better than varchars for querying and joining. Plus there are some existing known bugs with MySQL and index on varchars, si I'd rather use integer identifiers when I can. You can basically use index on other varchar fields when you need it to boost specific queries.
Indexed integers for basic CRUD operations seems to be a wize choice, it will probably be the most common operations done over your dataset.
Pierre.
Primary keys mean that there is one and only one value for each record. In a resume table that you posted earlier that means a person can have one and only one resume with name. I would typically reverse this. The uid of a user doesn't change, but it's name does, so you don't want to be using name as a primary key. Use uid as primary key and only put an index on name if you're talking about commonly looking up by or sorting by name. Primary keys usually imply an index behind it but also add requiring one and only one value for each record. You need the index to find out if a record already exists quickly. A primary key typically can be thought of as an index with a uniqueness constraint.
Indexes generally dramatically improve lookup time in large volume databases at the expense of taking a little longer to write records.
I would strongly recommend that you pick up a book on database design to learn these concepts.
________________________________
From: support-bounces@drupal.org [mailto:support-bounces@drupal.org] On Behalf Of Austin Einter Sent: Thursday, March 17, 2011 4:26 AM To: development@drupal.org Cc: support@drupal.org Subject: [support] Doubt on Index and Primary key
HI All
In .install file, in hook_schema unction, we are mentioning index field and primary key field.
I am wondering, how to choose which one should be index and which one should be primary key.
Does that affect the performance (in search, add, delete of records).
What I am thinking at this point of time, my database will contain job seekers name, mail-id, contact number, skill set (as comma separated values) and resume.
For index purpose, I am thinking I will use '$user->uid' as Index and and 'name' as primary key.
I am bit confused how it it is going to affect search/add/delete time (assuming huge number of records in database) and also database management.
Please guide me.
Regards
Austin
Just to add to what David and Pierre wrote:
Especially for a large table, proper indexing is very important. In practical terms, you want to have an index on the fields after the "where" keyword in MySQL. For example: If the query would be "where name='whatever'", there should be an index on "name"; however, if a frequent query would be "where name='whatever' and city='New York', then it might be better for the index to comprise both. (create index indexname on jobs (name,city);), and the single column index would not be necessary anymore.
You can see the indices of your table using "show index from jobs;". As David pointed out, the primary key is indexed automatically. And yes, get a good book.
Ursula
On Thu, Mar 17, 2011 at 8:30 AM, Metzler, David metzlerd@evergreen.edu wrote:
Primary keys mean that there is one and only one value for each record. In a resume table that you posted earlier that means a person can have one and only one resume with name. I would typically reverse this. The uid of a user doesn’t change, but it’s name does, so you don’t want to be using name as a primary key. Use uid as primary key and only put an index on name if you’re talking about commonly looking up by or sorting by name. Primary keys usually imply an index behind it but also add requiring one and only one value for each record. You need the index to find out if a record already exists quickly. A primary key typically can be thought of as an index with a uniqueness constraint.
Indexes generally dramatically improve lookup time in large volume databases at the expense of taking a little longer to write records.
I would strongly recommend that you pick up a book on database design to learn these concepts.
From: support-bounces@drupal.org [mailto:support-bounces@drupal.org] On Behalf Of Austin Einter Sent: Thursday, March 17, 2011 4:26 AM To: development@drupal.org Cc: support@drupal.org Subject: [support] Doubt on Index and Primary key
HI All
In .install file, in hook_schema unction, we are mentioning index field and primary key field.
I am wondering, how to choose which one should be index and which one should be primary key.
Does that affect the performance (in search, add, delete of records).
What I am thinking at this point of time, my database will contain job seekers name, mail-id, contact number, skill set (as comma separated values) and resume.
For index purpose, I am thinking I will use '$user->uid' as Index and and 'name' as primary key.
I am bit confused how it it is going to affect search/add/delete time (assuming huge number of records in database) and also database management.
Please guide me.
Regards
Austin
-- [ Drupal support list | http://lists.drupal.org/ ]
Hi David, Ursula, Pierre I really appreciate your valuable input in this regard.
I did some search on Google/Drupal about indexing. Here are my understanding. Please correct me if wrong.
1. Indexing are done per coulmn basis in a table. 2. It helps during search operation to locate a particular row having same coulmn value as that of search string.
I am yet to find, how to acheive it. I assume, I need to modify hook_schema API in my module install file. Assuming I want to index the 'name' coulmn, probably I need to add below line in hook_schema (towards end of API).
'indexes' => array('name' => array('name'),),
Please correct me if my assumption is wrong. *The next BIG question - * I have two fields namely expert skill set and average skill set , those contain expert level and average level skills of an individual. My seacrh logic will be based on skill set.
Lets take an example. Say in row 10, expert skill set = PHP, Web2.0, Android, IPhone, HTML Say in row 10, average skill set = Database, TCL/TK, Pearl, C, C++, Java
(Please note that skill sets are comma separated).
When a potential recruiter searches resumes for Android , then I want all the candidates resumes with Android skill set should be shown in a view in tabular format, each page showing say 25 rows.
Now in this case, coulmn value is "PHP, Web2.0, Android, IPhone, HTML". And search string is "Android".
So will search query will retreive the row 10? My concern is coulmn value and search string are not a complete match, but a partial match.
Best Regards Austin
On Thu, Mar 17, 2011 at 11:09 PM, Ursula Pieper dramamezzo@gmail.comwrote:
Just to add to what David and Pierre wrote:
Especially for a large table, proper indexing is very important. In practical terms, you want to have an index on the fields after the "where" keyword in MySQL. For example: If the query would be "where name='whatever'", there should be an index on "name"; however, if a frequent query would be "where name='whatever' and city='New York', then it might be better for the index to comprise both. (create index indexname on jobs (name,city);), and the single column index would not be necessary anymore.
You can see the indices of your table using "show index from jobs;". As David pointed out, the primary key is indexed automatically. And yes, get a good book.
Ursula
On Thu, Mar 17, 2011 at 8:30 AM, Metzler, David metzlerd@evergreen.edu wrote:
Primary keys mean that there is one and only one value for each record.
In
a resume table that you posted earlier that means a person can have one
and
only one resume with name. I would typically reverse this. The uid of a user doesn’t change, but it’s name does, so you don’t want to be using
name
as a primary key. Use uid as primary key and only put an index on name
if
you’re talking about commonly looking up by or sorting by name. Primary
keys
usually imply an index behind it but also add requiring one and only one value for each record. You need the index to find out if a record
already
exists quickly. A primary key typically can be thought of as an index
with a
uniqueness constraint.
Indexes generally dramatically improve lookup time in large volume
databases
at the expense of taking a little longer to write records.
I would strongly recommend that you pick up a book on database design to learn these concepts.
From: support-bounces@drupal.org [mailto:support-bounces@drupal.org] On Behalf Of Austin Einter Sent: Thursday, March 17, 2011 4:26 AM To: development@drupal.org Cc: support@drupal.org Subject: [support] Doubt on Index and Primary key
HI All
In .install file, in hook_schema unction, we are mentioning index field
and
primary key field.
I am wondering, how to choose which one should be index and which one
should
be primary key.
Does that affect the performance (in search, add, delete of records).
What I am thinking at this point of time, my database will contain job seekers name, mail-id, contact number, skill set (as comma separated
values)
and resume.
For index purpose, I am thinking I will use '$user->uid' as Index and and 'name' as primary key.
I am bit confused how it it is going to affect search/add/delete time (assuming huge number of records in database) and also database
management.
Please guide me.
Regards
Austin
-- [ Drupal support list | http://lists.drupal.org/ ]
-- [ Drupal support list | http://lists.drupal.org/ ]
Le vendredi 18 mars 2011 à 07:55 +0530, Austin Einter a écrit :
Hi David, Ursula, Pierre I really appreciate your valuable input in this regard.
I did some search on Google/Drupal about indexing. Here are my understanding. Please correct me if wrong.
- Indexing are done per coulmn basis in a table.
- It helps during search operation to locate a particular row having
same coulmn value as that of search string.
And it also help for JOIN statements and such, but yes it is.
I am yet to find, how to acheive it. I assume, I need to modify hook_schema API in my module install file. Assuming I want to index the 'name' coulmn, probably I need to add below line in hook_schema (towards end of API).
'indexes' => array('name' => array('name'),),
You are right. That's probably also what the documentation say :) If you forgot an index you can add it later in a hook_update_N() implementation so don't worry you don't have your indexes to be perfect on the first try.
You can also do multi-column indexes, which can help (depending on the DBMS you use) for queries that uses multiple WHERE.
You don't have to put indexes everywhere, just put some over the field you will use for querying or joining.
Please correct me if my assumption is wrong.
The next BIG question - I have two fields namely expert skill set and average skill set , those contain expert level and average level skills of an individual. My seacrh logic will be based on skill set.
Lets take an example. Say in row 10, expert skill set = PHP, Web2.0, Android, IPhone, HTML Say in row 10, average skill set = Database, TCL/TK, Pearl, C, C++, Java
(Please note that skill sets are comma separated).
When a potential recruiter searches resumes for Android , then I want all the candidates resumes with Android skill set should be shown in a view in tabular format, each page showing say 25 rows.
Now in this case, coulmn value is "PHP, Web2.0, Android, IPhone, HTML". And search string is "Android".
So will search query will retreive the row 10? My concern is coulmn value and search string are not a complete match, but a partial match.
Here you should probably normalize your schema, which means store the skills into their own table, and provide a n <-> n relation table between your data table and the skill table. You can then query doing a JOIN statements which will probably be more natural for developers, faster, avoid string comparison (really slower than integers).
Doing schema normalization is not only good for performances, but it will also avoid redundancy.
But, I won't teach you the full relational model theory, you should start there maybe before doing SQL. Google would give you excellent theorycal explaination documents that may provide you beginner and or advanded theory and technical documentation.
Pierre.
Pierre is spot on here.
That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with
On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you?
Dave -----Original Message----- From: support-bounces@drupal.org [mailto:support-bounces@drupal.org] On Behalf Of Pierre Rineau Sent: Friday, March 18, 2011 6:45 AM To: support@drupal.org Subject: Re: [support] Doubt on Index and Primary key
Le vendredi 18 mars 2011 à 07:55 +0530, Austin Einter a écrit :
Hi David, Ursula, Pierre I really appreciate your valuable input in this regard.
I did some search on Google/Drupal about indexing. Here are my understanding. Please correct me if wrong.
- Indexing are done per coulmn basis in a table.
- It helps during search operation to locate a particular row having
same coulmn value as that of search string.
And it also help for JOIN statements and such, but yes it is.
I am yet to find, how to acheive it. I assume, I need to modify hook_schema API in my module install file. Assuming I want to index the 'name' coulmn, probably I need to add below line in hook_schema (towards end of API).
'indexes' => array('name' => array('name'),),
You are right. That's probably also what the documentation say :) If you forgot an index you can add it later in a hook_update_N() implementation so don't worry you don't have your indexes to be perfect on the first try.
You can also do multi-column indexes, which can help (depending on the DBMS you use) for queries that uses multiple WHERE.
You don't have to put indexes everywhere, just put some over the field you will use for querying or joining.
Please correct me if my assumption is wrong.
The next BIG question - I have two fields namely expert skill set and average skill set , those contain expert level and average level skills of an individual. My seacrh logic will be based on skill set.
Lets take an example. Say in row 10, expert skill set = PHP, Web2.0, Android, IPhone, HTML Say in row 10, average skill set = Database, TCL/TK, Pearl, C, C++, Java
(Please note that skill sets are comma separated).
When a potential recruiter searches resumes for Android , then I want all the candidates resumes with Android skill set should be shown in a view in tabular format, each page showing say 25 rows.
Now in this case, coulmn value is "PHP, Web2.0, Android, IPhone, HTML". And search string is "Android".
So will search query will retreive the row 10? My concern is coulmn value and search string are not a complete match, but a partial match.
Here you should probably normalize your schema, which means store the skills into their own table, and provide a n <-> n relation table between your data table and the skill table. You can then query doing a JOIN statements which will probably be more natural for developers, faster, avoid string comparison (really slower than integers).
Doing schema normalization is not only good for performances, but it will also avoid redundancy.
But, I won't teach you the full relational model theory, you should start there maybe before doing SQL. Google would give you excellent theorycal explaination documents that may provide you beginner and or advanded theory and technical documentation.
Pierre.
Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit :
Pierre is spot on here.
That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with
On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you?
Dave
Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields.
But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff.
Pierre.
Hi Pierre, David, Ursula Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation.
After understanding a bit on normalisation and join, I have comeup with below approach for this specific case.
Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables. I am attaching a table.xls file , please have a look.
In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables.
Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc.
So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set.
Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc And under NetworkManagement table, NMS and SNMP can be coulmns.
In table cells, I will keep either 1 or 0, depending on the person has that skill or not.
Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below.
*UID* NMS SNMP 2 1 1 3 0 1 4 0 1
As first time I am doing this, I might be wrong. If so, kindly let me know.
Best Regards Austin.
On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau < pierre.rineau@makina-corpus.com> wrote:
Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit :
Pierre is spot on here.
That is why most dbas would advise against storing this data in a comma
separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with
On a separate note, you do understand that the site that you're talking
about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you?
Dave
Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields.
But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff.
Pierre.
-- [ Drupal support list | http://lists.drupal.org/ ]
Austin,
I think you're off to a good start. I might suggest that you do something about all of those skill-set tables. You could replace them with three tables and still have the information in easy reach:
- A skillGroup table with -- sgID - an integer primary key -- name - name of the group, like 'Web Skills', 'Database'
- A skills table with -- sID - an integer primary key -- sgID - ID of the skillGroup that the skill belongs to -- skill - the name of the skill, like 'PHP' or 'PostgreSql'
- A skillset table with -- UID - your user ID -- sID - ID of the skill being rated -- skillRating - 0 or 1 or whatever
The skillset entry links to the skill via the sID and to the user table (Table 1) with the UID. The skill table is then linked to the skillGroup via the sgID. This way you can add skill groups and skills as need without needing to create more tables.
Good Luck,
Pat
On 3/24/2011 6:58 PM, Austin Einter wrote:
Hi Pierre, David, Ursula Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation. After understanding a bit on normalisation and join, I have comeup with below approach for this specific case. Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables. I am attaching a table.xls file , please have a look. In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables. Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc. So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set. Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc And under NetworkManagement table, NMS and SNMP can be coulmns. In table cells, I will keep either 1 or 0, depending on the person has that skill or not. Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below. *UID* NMS SNMP 2 1 1 3 0 1 4 0 1
As first time I am doing this, I might be wrong. If so, kindly let me know. Best Regards Austin.
On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau <pierre.rineau@makina-corpus.com mailto:pierre.rineau@makina-corpus.com> wrote:
Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit : > Pierre is spot on here. > > That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with > > On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you? > > Dave Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields. But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff. Pierre. -- [ Drupal support list | http://lists.drupal.org/ ]
Hi Pat Thanks for detail help. I really appreciate it. I did put the tables in excel sheet. It looks fine.
But the only worry is skillset table row count.
Assuming on average, one person has 10 different skills, then skillset table will have 10 entries per user. Assuming total 50,000,00 users (May be a higher number , but I want to design keeping this number high), then total number of rows in skillset table will be 50,000,00 x 10 = 50,000,000 which is really high. Is this going to be a bottle neck from MySql perspective, search time perspective??
I am attaching the excel sheet (please where I have put the tables as per suggestion), please have a look.
Best Regards Austin
On Fri, Mar 25, 2011 at 2:27 PM, Pat Johnston Pat@melrosecenter.com wrote:
Austin,
I think you're off to a good start. I might suggest that you do something about all of those skill-set tables. You could replace them with three tables and still have the information in easy reach:
- A skillGroup table with
-- sgID - an integer primary key -- name - name of the group, like 'Web Skills', 'Database'
- A skills table with
-- sID - an integer primary key -- sgID - ID of the skillGroup that the skill belongs to -- skill - the name of the skill, like 'PHP' or 'PostgreSql'
- A skillset table with
-- UID - your user ID -- sID - ID of the skill being rated -- skillRating - 0 or 1 or whatever
The skillset entry links to the skill via the sID and to the user table (Table 1) with the UID. The skill table is then linked to the skillGroup via the sgID. This way you can add skill groups and skills as need without needing to create more tables.
Good Luck,
Pat
On 3/24/2011 6:58 PM, Austin Einter wrote:
Hi Pierre, David, Ursula Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation.
After understanding a bit on normalisation and join, I have comeup with below approach for this specific case.
Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables. I am attaching a table.xls file , please have a look.
In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables.
Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc.
So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set.
Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc And under NetworkManagement table, NMS and SNMP can be coulmns.
In table cells, I will keep either 1 or 0, depending on the person has that skill or not.
Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below.
*UID* NMS SNMP 2 1 1 3 0 1 4 0 1As first time I am doing this, I might be wrong. If so, kindly let me know.
Best Regards Austin.
On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau < pierre.rineau@makina-corpus.com> wrote:
Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit :
Pierre is spot on here.
That is why most dbas would advise against storing this data in a comma
separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with
On a separate note, you do understand that the site that you're talking
about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you?
Dave
Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields.
But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff.
Pierre.
-- [ Drupal support list | http://lists.drupal.org/ ]
-- [ Drupal support list | http://lists.drupal.org/ ]
Austin,
I haven't dealt with high row counts with MySQL - I don't know for sure. But the rows are small so you're not using too much space. I would think it should be OK with some performance tuning. I'd suggest indexing the Skillset table by UID and sID.That and maybe some decent hardware configuration should be able to handle it.
Any comments from the MySQL people?
Regards,
Pat
On 3/25/2011 7:24 PM, Austin Einter wrote:
Hi Pat Thanks for detail help. I really appreciate it. I did put the tables in excel sheet. It looks fine. But the only worry is skillset table row count. Assuming on average, one person has 10 different skills, then skillset table will have 10 entries per user. Assuming total 50,000,00 users (May be a higher number , but I want to design keeping this number high), then total number of rows in skillset table will be 50,000,00 x 10 = 50,000,000 which is really high. Is this going to be a bottle neck from MySql perspective, search time perspective?? I am attaching the excel sheet (please where I have put the tables as per suggestion), please have a look. Best Regards Austin
On Fri, Mar 25, 2011 at 2:27 PM, Pat Johnston <Pat@melrosecenter.com mailto:Pat@melrosecenter.com> wrote:
Austin, I think you're off to a good start. I might suggest that you do something about all of those skill-set tables. You could replace them with three tables and still have the information in easy reach: - A skillGroup table with -- sgID - an integer primary key -- name - name of the group, like 'Web Skills', 'Database' - A skills table with -- sID - an integer primary key -- sgID - ID of the skillGroup that the skill belongs to -- skill - the name of the skill, like 'PHP' or 'PostgreSql' - A skillset table with -- UID - your user ID -- sID - ID of the skill being rated -- skillRating - 0 or 1 or whatever The skillset entry links to the skill via the sID and to the user table (Table 1) with the UID. The skill table is then linked to the skillGroup via the sgID. This way you can add skill groups and skills as need without needing to create more tables. Good Luck, Pat On 3/24/2011 6:58 PM, Austin Einter wrote:Hi Pierre, David, Ursula Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation. After understanding a bit on normalisation and join, I have comeup with below approach for this specific case. Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables. I am attaching a table.xls file , please have a look. In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables. Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc. So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set. Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc And under NetworkManagement table, NMS and SNMP can be coulmns. In table cells, I will keep either 1 or 0, depending on the person has that skill or not. Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below. *UID* NMS SNMP 2 1 1 3 0 1 4 0 1 As first time I am doing this, I might be wrong. If so, kindly let me know. Best Regards Austin. On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau <pierre.rineau@makina-corpus.com <mailto:pierre.rineau@makina-corpus.com>> wrote: Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit : > Pierre is spot on here. > > That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with > > On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you? > > Dave Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields. But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff. Pierre. -- [ Drupal support list | http://lists.drupal.org/ ]-- [ Drupal support list | http://lists.drupal.org/ ]
Pat describes roughly what I had in mind in much greater detail. You should not be worried about the row counts, here. This solution scales much better than the one you have purposed, especially for large row counts.
I have 3 million row tables containing more data than this sitting on my completely unoptimized laptop that I query regularly query. When the queries are properly indexed, they return in milliseconds.
I will say that your obsession with scale is a bit misplaced in your first database design project. Don't sacrifice normalization for scale until you really know what you're doing around database design.... probably a minimum of 5 years experience under your belt before you start making those kinds of decisions.
-----Original Message----- From: support-bounces@drupal.org on behalf of Pat Johnston Sent: Fri 3/25/2011 20:00 To: support@drupal.org Subject: Re: [support] Doubt on Index and Primary key
Austin,
I haven't dealt with high row counts with MySQL - I don't know for sure. But the rows are small so you're not using too much space. I would think it should be OK with some performance tuning. I'd suggest indexing the Skillset table by UID and sID.That and maybe some decent hardware configuration should be able to handle it.
Any comments from the MySQL people?
Regards,
Pat
On 3/25/2011 7:24 PM, Austin Einter wrote:
Hi Pat Thanks for detail help. I really appreciate it. I did put the tables in excel sheet. It looks fine. But the only worry is skillset table row count. Assuming on average, one person has 10 different skills, then skillset table will have 10 entries per user. Assuming total 50,000,00 users (May be a higher number , but I want to design keeping this number high), then total number of rows in skillset table will be 50,000,00 x 10 = 50,000,000 which is really high. Is this going to be a bottle neck from MySql perspective, search time perspective?? I am attaching the excel sheet (please where I have put the tables as per suggestion), please have a look. Best Regards Austin
On Fri, Mar 25, 2011 at 2:27 PM, Pat Johnston <Pat@melrosecenter.com mailto:Pat@melrosecenter.com> wrote:
Austin, I think you're off to a good start. I might suggest that you do something about all of those skill-set tables. You could replace them with three tables and still have the information in easy reach: - A skillGroup table with -- sgID - an integer primary key -- name - name of the group, like 'Web Skills', 'Database' - A skills table with -- sID - an integer primary key -- sgID - ID of the skillGroup that the skill belongs to -- skill - the name of the skill, like 'PHP' or 'PostgreSql' - A skillset table with -- UID - your user ID -- sID - ID of the skill being rated -- skillRating - 0 or 1 or whatever The skillset entry links to the skill via the sID and to the user table (Table 1) with the UID. The skill table is then linked to the skillGroup via the sgID. This way you can add skill groups and skills as need without needing to create more tables. Good Luck, Pat On 3/24/2011 6:58 PM, Austin Einter wrote:Hi Pierre, David, Ursula Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation. After understanding a bit on normalisation and join, I have comeup with below approach for this specific case. Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables. I am attaching a table.xls file , please have a look. In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables. Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc. So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set. Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc And under NetworkManagement table, NMS and SNMP can be coulmns. In table cells, I will keep either 1 or 0, depending on the person has that skill or not. Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below. *UID* NMS SNMP 2 1 1 3 0 1 4 0 1 As first time I am doing this, I might be wrong. If so, kindly let me know. Best Regards Austin. On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau <pierre.rineau@makina-corpus.com <mailto:pierre.rineau@makina-corpus.com>> wrote: Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit : > Pierre is spot on here. > > That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with > > On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you? > > Dave Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields. But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff. Pierre. -- [ Drupal support list | http://lists.drupal.org/ ]-- [ Drupal support list | http://lists.drupal.org/ ]
Hi David Thanks a lot. I am going ahead with the solution Pat proposed. I had asked this question in MySql forum, they also liked the solution. Best Regards Austin On Tue, Mar 29, 2011 at 12:47 AM, Metzler, David metzlerd@evergreen.eduwrote:
Pat describes roughly what I had in mind in much greater detail. You should not be worried about the row counts, here. This solution scales much better than the one you have purposed, especially for large row counts.
I have 3 million row tables containing more data than this sitting on my completely unoptimized laptop that I query regularly query. When the queries are properly indexed, they return in milliseconds.
I will say that your obsession with scale is a bit misplaced in your first database design project. Don't sacrifice normalization for scale until you really know what you're doing around database design.... probably a minimum of 5 years experience under your belt before you start making those kinds of decisions.
-----Original Message----- From: support-bounces@drupal.org on behalf of Pat Johnston Sent: Fri 3/25/2011 20:00 To: support@drupal.org Subject: Re: [support] Doubt on Index and Primary key
Austin,
I haven't dealt with high row counts with MySQL - I don't know for sure. But the rows are small so you're not using too much space. I would think it should be OK with some performance tuning. I'd suggest indexing the Skillset table by UID and sID.That and maybe some decent hardware configuration should be able to handle it.
Any comments from the MySQL people?
Regards,
Pat
On 3/25/2011 7:24 PM, Austin Einter wrote:
Hi Pat Thanks for detail help. I really appreciate it. I did put the tables in excel sheet. It looks fine. But the only worry is skillset table row count. Assuming on average, one person has 10 different skills, then skillset table will have 10 entries per user. Assuming total 50,000,00 users (May be a higher number , but I want to design keeping this number high), then total number of rows in skillset table will be 50,000,00 x 10 = 50,000,000 which is really high. Is this going to be a bottle neck from MySql perspective, search time perspective?? I am attaching the excel sheet (please where I have put the tables as per suggestion), please have a look. Best Regards Austin
On Fri, Mar 25, 2011 at 2:27 PM, Pat Johnston <Pat@melrosecenter.com mailto:Pat@melrosecenter.com> wrote:
Austin, I think you're off to a good start. I might suggest that you do something about all of those skill-set tables. You could replace them with three tables and still have the information in easy reach: - A skillGroup table with -- sgID - an integer primary key -- name - name of the group, like 'Web Skills', 'Database' - A skills table with -- sID - an integer primary key -- sgID - ID of the skillGroup that the skill belongs to -- skill - the name of the skill, like 'PHP' or 'PostgreSql' - A skillset table with -- UID - your user ID -- sID - ID of the skill being rated -- skillRating - 0 or 1 or whatever The skillset entry links to the skill via the sID and to the user table (Table 1) with the UID. The skill table is then linked to the skillGroup via the sgID. This way you can add skill groups and skills as need without needing to create more tables. Good Luck, Pat On 3/24/2011 6:58 PM, Austin Einter wrote:Hi Pierre, David, Ursula Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation. After understanding a bit on normalisation and join, I have comeup with below approach for this specific case. Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables. I am attaching a table.xls file , please have a look. In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables. Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc. So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set. Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc And under NetworkManagement table, NMS and SNMP can be coulmns. In table cells, I will keep either 1 or 0, depending on the person has that skill or not. Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below. *UID* NMS SNMP 2 1 1 3 0 1 4 0 1 As first time I am doing this, I might be wrong. If so, kindly let me know. Best Regards Austin. On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau <pierre.rineau@makina-corpus.com <mailto:pierre.rineau@makina-corpus.com>> wrote: Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit : > Pierre is spot on here. > > That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with > > On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you? > > Dave Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields. But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff. Pierre. -- [ Drupal support list | http://lists.drupal.org/ ]-- [ Drupal support list | http://lists.drupal.org/ ]-- [ Drupal support list | http://lists.drupal.org/ ]
Think about how you would write a query to say who has the skill set x? You can't really do it, cause you've got to change what table you select based on what you're looking for? How much php code will you have to change to a). Add a skill set? B.) add a work domain?
Rather design a table for user skill sets with the columns:
Uid, skill_set_id
And a lookup table to define skill sets and their categories and their relavent work domains:
Skill_set_id, skill_set_descr, work_domain
If you need to create 50 tables for modeling a concept you probably aren't normalized. There are other approaches, but this is probably the simplest.
Dave
________________________________
From: support-bounces@drupal.org [mailto:support-bounces@drupal.org] On Behalf Of Austin Einter Sent: Thursday, March 24, 2011 6:58 PM To: support@drupal.org; pierre.rineau@makina-corpus.com Subject: Re: [support] Doubt on Index and Primary key
Hi Pierre, David, Ursula
Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation.
After understanding a bit on normalisation and join, I have comeup with below approach for this specific case.
Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables.
I am attaching a table.xls file , please have a look.
In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables.
Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc.
So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set.
Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc
And under NetworkManagement table, NMS and SNMP can be coulmns.
In table cells, I will keep either 1 or 0, depending on the person has that skill or not.
Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below.
UID
NMS
SNMP
2
1
1
3
0
1
4
0
1
As first time I am doing this, I might be wrong. If so, kindly let me know.
Best Regards
Austin.
On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau pierre.rineau@makina-corpus.com wrote:
Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit :
Pierre is spot on here.
That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with
On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you?
Dave
Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields.
But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff.
Pierre.
-- [ Drupal support list | http://lists.drupal.org/ ]
Hi Dave I am not sure if I understood your suggestion properly. I am writing below my understanding. I have given the table example below, if wrong, please correct me.
Please see my response inline.
Regards Kamal
On Fri, Mar 25, 2011 at 9:48 PM, Metzler, David metzlerd@evergreen.eduwrote:
Think about how you would write a query to say who has the skill set x? You can’t really do it, cause you’ve got to change what table you select based on what you’re looking for? How much php code will you have to change to a). Add a skill set? B.) add a work domain?
Rather design a table for user skill sets with the columns:
Uid, skill_set_id
So the table will look as below.
*UID* *PHP* *HTML* *VOIP* *SQL* *MYSQL* 1 1 0 1 0 1 2 0 1 1 1 0 3 1 1 0 1 0 4 0 0 0 0 0
If this is the case, table coulmn will keep increasing.
And a lookup table to define skill sets and their categories and their relavent work domains:
Skill_set_id, skill_set_descr, work_domain
Is this example holds good?
Skill ID Skill desc work domain PHP Good in PHP Web HTML Average in HTML Web VOIP Average in VoIP VoIP SQL Good in SQL Database MYSQL Good in MySql Database
If you need to create 50 tables for modeling a concept you probably aren’t normalized. There are other approaches, but this is probably the simplest.
Dave
*From:* support-bounces@drupal.org [mailto:support-bounces@drupal.org] *On Behalf Of *Austin Einter *Sent:* Thursday, March 24, 2011 6:58 PM *To:* support@drupal.org; pierre.rineau@makina-corpus.com
*Subject:* Re: [support] Doubt on Index and Primary key
Hi Pierre, David, Ursula
Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation.
After understanding a bit on normalisation and join, I have comeup with below approach for this specific case.
Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables.
I am attaching a table.xls file , please have a look.
In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables.
Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc.
So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set.
Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc
And under NetworkManagement table, NMS and SNMP can be coulmns.
In table cells, I will keep either 1 or 0, depending on the person has that skill or not.
Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below.
*UID*
NMS
SNMP
2
1
1
3
0
1
4
0
1
As first time I am doing this, I might be wrong. If so, kindly let me know.
Best Regards
Austin.
On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau < pierre.rineau@makina-corpus.com> wrote:
Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit :
Pierre is spot on here.
That is why most dbas would advise against storing this data in a comma
separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with
On a separate note, you do understand that the site that you're talking
about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you?
Dave
Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields.
But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff.
Pierre.
-- [ Drupal support list | http://lists.drupal.org/ ]
-- [ Drupal support list | http://lists.drupal.org/ ]